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Crystal Structure Of Beta Site App Cleaving Enzyme (Race) And Methods Of 

UseThereof 

Related Applications 

This application claims priority to U.S. Provisional Patent Application Serial Number 
5 60/398,681 filed July 26, 2002, and corresponds to International Patent Application number 
(Attorney docket number AHB/CP6162168) filed July 25, 2003. 

All documents cited in this text, and all documents cited or referenced in documents cited in 
this text, and any manufacturer's instructions or catalogues for any products cited or 
mentioned in this text or in any document hereby incorporated into this text, are hereby 

10 incorporated herein by reference. Documents incorporated by reference into this text or any 
teachings therein may be used in the practice of this invention. Documents incorporated by 
reference into this text are not admitted to be prior art. Furthermore, authors or inventors on 
documents incorporated by reference into this text are not to be considered to be "another" 
or "others" as to the present inventive entity and vice versa, especially where one or more 

15 authors or inventors on documents incorporated by reference into this text are an inventor or 
inventors named in the present inventive entity. 

j 

Field of the Invention 

The present invention relates to the mutant BACE proteins, recombinant BACE proteins, 
processes for crystallizing BACE and in particular to its crystal structure and to the uses of 
20 this structure in drug discovery. 

Background to the Invention 

Alzheimer's disease 

Alzheimer's disease (AD) is estimated to afflict more than 20 million people worldwide and 
is believed to be the most common form of dementia. Alzheimer's disease is a progressive 
25 dementia in which massive deposits of aggregated protein breakdown products - amyloid 
plaques and neurofibrillary tangles accumulate in the brain. The amyloid plaques are 
thought to be responsible for the mental decline seen in Alzheimer's patients. 
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Ap or amyloid- p-protein is the major constituent of the plaques which are characteristic of 
Alzheimer's disease (De Strooper et al, 1999). Ap is a 39-42 residue peptide formed by the 
specific cleavage of a class I transmembrane protein called APP, or amyloid precursor 
protein. A p-secretase activity cleaves this protein between residues Met671 and Asp672 
5 (numbering of 770aa isoform of APP) to form the N-terminus of Ap. A second cleavage of 
the peptide is associated with p-secretase to form the C-terminus of the Ap peptide. 

Beta Site APP.Cleaving Enzyme (BACE) and Alzheimer's Disease 

Several groups have identified and isolated aspartate proteases that have p-secretase activity 
(Hussain et al., 1999; Lin et. al, 2000; Yan et. al, 1999; Sinha et. al., 1999 and Vassar et al., 

10 1999). p-secretase is also known in the literature as Asp2 (Yan et. al, 1999), Beta site APP 
Cleaving Enzyme (BACE or BACE1) (Vassar et. al., 1999) or memapsin-2 (Lin et al., 
2000). BACE was identified using a number of experimental approaches such as EST 
database analysis (Hussain et al. 1 999); expression cloning (Vassar et al. 1999); 
identification of human homologs from public databases of predicted C. elegans proteins 

15 (Yan et al. 1999) and finally utilizing an inhibitor to purify the protein from human brain 
(Sinha et al. 1999). Thus, five groups employing three different experimental approaches 
led to the identification of the same enzyme, making a strong case that BACE is a p- 
secretase. Mention is also made of the patent literature: WO96/40885, EP871720, U.S. 
Patents Nos. 5,942,400 and 5,744,346, EP855444, US 6,319,689, W099/64587, 

20 W099/31236, EP1037977, WO00/17369, WO01/23533, WO0047618, WO00/58479, 
WO00/69262, WO01/00663, WO01/00665, US 6,313,268. 

BACE is a membrane bound type 1 protein that is synthesized as a partially active 
proenzyme, and is abundantly expressed in brain tissue. It is thought to represent the major 
p-secretase activity, and is considered to be the rate-limiting step in the production of Ap. It 
25 is thus of special interest in the pathology of Alzheimer's disease, and in the development of 
drugs as a treatment for Alzheimer's disease. 

BACE was found to be a pepsin-like aspartyl proteinase, the mature enzyme consisting of 
the N-terminal catalytic domain, a transmembrane domain, and a small cytoplasmic domain. 
BACE has an optimum activity at pH 4.0-5.0 (Vassar et al, 1999) and is inhibited weakly by 
30 standard pepsin inhibitors such as pepstatin. It has been shown that the catalytic domain 
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minus the transmembrane and cytoplasmic domain has activity against substrate peptides 
(Lin et al, 2000). Consequently, this soluble catalytic domain is suitable for crystallization 
studies and a crystal structure of this will give a representative structure of the BACE active 
site for the design of inhibitor molecules. 

5 The likelihood of developing Alzheimer's disease increases with age, and as the aging 
population of the developed world increases, this disease becomes a greater and greater 
problem. In addition to this, there is a familial link to Alzheimer's disease and consequently 
any individuals possessing the double mutation of APP known as the Swedish mutation (in 
which the mutated APP forms a considerably improved substrate for BACE) have a much 

10 greater chance of developing AD, and also of developing it at an early age {see also US 
6,245,964 and US 5,877,399 pertaining to transgenic rodents comprising APP-Swedish). 
Consequently there is a strong case for developing a compound that can be used in a 
prophylactic fashion for these individuals. 

Hence, drugs that reduce or block BACE activity would reduce AP levels and levels of 
15 fragments of AP in the brain or elsewhere where Ap or fragments thereof deposit and thus 
slow the formation of amyloid plaques and the progression of AD or other maladies 
involving deposition of Ap or fragments thereof (Yankner, 1996; De Strooper and Konig, 
1999). BACE is therefore an important candidate for the development of drugs as a 
treatment against Alzheimer's disease and/or against such other maladies. 

20 The therapeutic potential of inhibiting the deposition of Ap has motivated many groups to 
isolate and characterize secretase enzymes and to identify their potential inhibitors {see, 
e.g.> WO01/23533 A2, EP0855444, WO00/17369, WO00/58479, WO00/47618, 
WO00/77030, WO01/00665, WO01/00663, WO01/29563, WO02/25276, US5,942,400, 
US6,245,884, US6,221,667, US6,21 1,235, WO02/02505, WO02/02506, WO02/02512, 

25 WO02/025 1 8, WO02/02520, WO02/14264). 

The gene encoding APP is found on chromosome 21, which is also the chromosome found 
as an extra copy in Downs syndrome. Downs syndrome patients tend to acquire Alzheimers 
disease at an early age, with almost all those over 40 years of age showing Alzheimers-type 
pathology (Oyama et al., 1994). This is thought to be due to the extra copy of the APP gene 
30 found in these patients, which leads to overexpression of APP and therefore to increased 
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levels of APPP causing the high prevalence of Alzheimers disease seen in this population. 
Thus inhibitors of BACE could be useful in reducing Alzheimers-type pathology in Down's 
syndrome patients. 

It would therefore be useful to inhibit the deposition of A(3 and portions thereof by 
5 inhibiting BACE through inhibitors designed from the BACE structure as provided herein. 
The determination of the three-dimensional structure of BACE provides a basis for the 
design of new and specific ligands for BACE. For example, knowing the three-dimensional 
structure of BACE, computer modelling programs may be used to design different 
molecules expected to interact with possible or confirmed binding cavities or other 
10 structural or functional features of BACE or structure-based design approaches may used 
such as those described in Blundell et al (Nature Reviews, Drug Discovery, Vol 1, pg 45- 
54, 2002). 

Ideally it would be desirable to have an abundant supply of this enzyme in homogenous 
form. It would also be preferable to solve the structure of a form of BACE with an 
15 unoccupied active site. This could be used to soak in small molecule inhibitors of the 
enzyme and to investigate their binding modes. We describe here the high yielding 
production of BACE from bacterial cells in homogenous form, and the generation of protein 
suitable for crystallisation and structure determination of BACE in Apo form 

Protein Crystallisation 

20 It is well known in the art of protein chemistry that crystallising a protein is an uncertain 

and difficult process without any clear expectation of success. It is now evident that protein 
crystallization is the main hurdle in protein structure determination. For this reason, protein 
crystallization has become a research subject in and of itself, and is not simply an extension 
of the protein crystallographer's laboratory. There are many references, which describe the 

25 difficulties associated with growing protein crystals (Kierzek AM. and Zielenkiewicz P. 
(2001) Biophysical Chemistry 91 1-20 Models of protein crystal growth; Wiencek JM 
(1999) Annu Rev Biomed Eng 1 505-534 New Strategies for crystal growth). 

The reasons why it is commonly held that crystallization of protein molecules from solution 
is the major obstacle in the process of determining protein structures are many; proteins are 
30 complex molecules, and the delicate balance involving specific and non-specific 
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interactions with other protein molecules and small molecules in solution, is difficult to 
predict. 

Each protein crystallizes under a unique set of conditions, which cannot be predicted in 
advance. Simply supersaturating the protein to bring it out of solution will not work, the 
5 result would, in most cases, be an amorphous precipitate. Many precipitating agents are 
used, common ones are different salts, and polyethylene glycols, but others are known. In 
addition, additives such as metals and detergents can be added to modulate the behaviour of 
the protein in solution. Many kits are available (e.g., from Hampton Research), which 
attempt to cover as many parameters in crystallization space as possible, but in many cases 

10 these are just a starting point to optimize crystalline precipitates and crystals which are 
unsuitable for diffraction analysis. Successful crystallization is aided by knowledge of the 
proteins behaviour in terms of solubility, dependence on metal ions for correct folding or 
activity, interactions with other molecules and any other information that is available. Even 
so, crystallization of proteins is often regarded as a time-consuming process, whereby 

15 subsequent experiments build on observations of past trials. 

In cases where protein crystals are obtained, these are not necessarily always suitable for 
diffraction analysis; they may be limited in resolution, and it may subsequently be difficult 
to improve them to the point at which they will diffract to the resolution required for 
analysis. Limited resolution in a crystal can be due to several things. It may be due to 

20 intrinsic mobility of the protein within the crystal; this can be difficult to overcome, even 
with other crystal forms. It may be due to high solvent content within the crystal, which 
consequently results in weak scattering. Alternatively, it could be due to defects within the 
crystal lattice, which means that the diffracted x-rays will not be completely in phase from 
unit to unit within the lattice. Any one of these or a combination of these could mean that 

25 the crystals are not suitable for structure determination. 

Some proteins never crystallize, and after a reasonable attempt it is necessary to examine 
the protein itself and consider whether it is possible to make individual domains, different N 
or C -terminal truncations, or point mutations. It is often hard to predict how a protein could 
be re-engineered in such a manner as to improve crystallisability. Sometimes the inclusion 
30 of a ligand in the crystallisation mixture is essential for the production suitable crystals. Our 
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understanding of crystallisation mechanisms is still incomplete and the factors of protein 
structure, which are involved in crystallisation, are not well known. 

BACE Production for Crystallisation 

Beta secretase (BACE) is an integral membrane protein containing a signal sequence, a pro- 
5 peptide, a catalytic aspartyl protease domain, a transmembrane region and a C -terminal 
cytoplasmic region. During transit through the endoplasmic reticulum, Golgi apparatus and 
trans Golgi network the pro-peptide is cleaved by a furin-like protease (Bennett et al 2000, 
Creemers et al 2001) and N-glycosylation is added and matured (Haniu et al 2000). The 
protein contains 4 potential N-linked glycosylation sites, all of which are used (Bennett et 
10 al, 2000). 

Certain active recombinant BACEs - different from those of the herein invention - have 
been produced using heterologous expression systems for mammalian cells (Vassar et al, 
1999, Hussain et al, 1999), insect cells (Mallender et al, 2001) and bacterial cells (Lin et al 
2000). Preferred constructs for crystallisation would be soluble and lack glycosylation: the 
15 former can be achieved by C-terminal truncation of the protein to remove the 

transmembrane and cytoplasmic regions; while glycosylation could be removed either by 
use of a deglycosylating agent such as PNGase F, by expression of the protein in bacteria or 
by mutation of the glycosylation sites. 

The protein used for BACE crystallisation by Hong et al (2000) was produced in bacteria 
20 and was truncated at the C-terminus. Their protein was produced as insoluble inclusion 
bodies and required refolding to give soluble, active protein. Refolding of BACE is made 
more complex by the presence of 3 disulphide bonds in the native protease domain, which 
require careful control of redox conditions to form during in-vitro refolding. The protein 
produced by Hong et al was a mixture of products and was crystallised with inhibitor bound 
25 (see WO 01/00663, WO 01/00665, and US 6,545,127). 

Mention is also made of WO, 02/25276, which describes the crystallisation of BACE 
produced in mammalian cells. The protein produced also was a mixture of protein species 
and was also crystallized with an inhibitor bound. 
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Mention is also made of WO03/0 1 2089, which describes the crystallisation of BACE 
produced from insect cells. The co-ordinates of BACE with an inhibitor bound are 
provided. 

Summary of the Invention 

5 In general aspects, the present invention is concerned with the provision of a new, high 
resolution, apo, crystal form of BACE and the use of this structure in identifying or 
obtaining agent compounds (especially inhibitors of BACE) for modulating BACE activity, 
and in preferred embodiments identifying or obtaining actual agent compounds/inhibitors. 
Crystal structure information presented herein is useful in designing potential inhibitors and 

10 modelling them or their potential interaction with the BACE binding cavity. Potential 
inhibitors may be brought into contact with BACE to test for ability to interact with the 
BACE binding cavity. Actual inhibitors may be identified from among potential inhibitors 
synthesized following design and model work performed in silico. An inhibitor identified 
using the present invention may be formulated into a composition, for instance a 

15 composition comprising a pharmaceutically acceptable excipient, and may be used in the 
manufacture of a medicament for use in a method of treatment. 

Thus, according to a first aspect of the present invention there is provided a mutant BACE 
protein, which protein lacks one or more proteolytic cleavage sites recognized by clostripain 
(or another protease which recognizes the same cleavage site as clostripain). In particular, 

20 the protein is a BACE protein, which comprises the sequence set out in residues 45 to 455 
of SEQ ID NO:2 (43 to 453 SwissProt P56817), or a fragment thereof comprising residues 
corresponding to 58 to 398 of SEQ ID NO:2, modified by the following changes: (a) 
substitution or deletion of at least one residue which is a proteolytic cleavage site 
recognised by clostripain; and (b) optionally the replacement of from 1 to 30 other amino 

25 acids by an equivalent or fewer number of amino acids. It will be understood that when the 
BACE protein comprises a fragment as defined above, the fragment will comprise at least 
feature (a) and optionally feature (b). 

The modification is such that the BACE protein preferably retains at least one proteolytic 
cleavage site recognised by clostripain so that it may be cleaved to provide homogeneous 
30 location at which cleavage occurs. 
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According to a second aspect of the present invention there is provided a mutant BACE 
protein which is truncated at the N-terminal up to and including R42, R45, G55, R56 or 
R57. In a preferred aspect, when the protein is truncated up and including R56 the residue 
at position 57 is not arginine. It may for example be lysine. 

5 In a third'aspect the invention provides a mutant BACE protein selected from: (a) SEQ ID 
6; (b) SEQ ID 8; (c) SEQ ID 10; (d) SEQ ID 12; (e) SEQ ID 14; (f) SEQ ID 16; (g) SEQ 
ID 1 8; (h) SEQ ID 19; (i) SEQ ID 20; G) SEQ ID 21 . 

In another aspect, the invention contemplates a nucleic acid (e.g. DNA or RNA) sequence 
encoding the BACE protein of the invention, as well as the complementary nucleic acid 
10 sequence counterpart. 1 

The nucleic acids of the invention may be isolated, or may be present in the context of a 
vector or host cell. Thus, in another aspect, the invention contemplates a vector comprising 
the nucleic acid of the invention. 

The nature of the vector of the invention is not critical to the invention. Any suitable vector 
15 may be used, including expression vectors, plasmid, virus, bacteriophage, transposon, 
minichromosome, liposome or mechanical carrier. 

The expression vectors of the invention are DNA constructs suitable for expressing DNA 
which encodes the desired peptide and which may include: (a) a regulatory element (e.g. a 
promoter, operator, activator, repressor and/or enhancer), (b) a structural or coding sequence 
20 which is transcribed into mRNA and (c) appropriate transcription, translation, initiation and 
termination sequences. They may also contain sequence encoding any of various tags (e.g. 
to facilitate subsequent purification of the expressed protein, such as affinity (e.g. His tags). 

Particularly preferred are vectors which comprise an expression element or elements 
operably linked to the DNA of the invention to provide for expression thereof at suitable 
25 levels. Any of a wide variety of expression elements may be used, and the expression 
element or elements may for example be selected from promoters, enhancers, ribosome 
binding sites, operators and activating sequences. Such expression elements may comprise 
an enhancer, and for example may be regulatable, for example being inducible (via the 
addition of an inducer). 
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The vector may further comprise a positive selectable marker and/or a negative selectable 
marker. The use of a positive selectable marker facilitates the selection and/or identification 
of cells containing the vector. 

In another aspect, the invention contemplates a host cell comprising the vector of the 
5 invention. The nucleic acid of the invention may be intrdoduced into the host cell by any of 
a large number of convenient methods, including calcium phosphate transfection, DEAE- 
Dextran mediated transfection, electroporation or any other method known in the art. 

Any suitable host cell may be used, including prokaryotic host cells (such as Escherichia 
coli, Streptomyces spp. and Bacillus subtilis) and eukaryotic host cells. Suitable eukaryotic 
10 host cells include insect cells (e.g. using the baculovirus expression system), mammalian 
cells, fungal (e.g. yeast) cells and plant cells. Preferred mammalian cells are animal cells 
such as CHO, COS, C 127, 3T3, HeLa, HEK 293, NIH 3T3, BHK and Bowes melanoma 
(particularly preferred being CHO-K1, COS7, Yl adrenal and carcinoma cells). 

Cell-free translation systems can also be used to produce the peptides of the invention. 

15 Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts 

r 

are described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor, N.Y., (1989). 

Prokaryotic host cells are preferred in circumstances where the BACE protein is required in 
an unglycosylated state. 

20 According to another aspect of the invention there is provided a process for producing the 
BACE protein of the invention comprising the steps of: (a) culturing the host cell of the 
invention under conditions suitable for expression of the BACE protein; and optionally (b) 
isolating the expressed recombinant BACE protein. 

In a further aspect the invention provides a method of making BACE protein which 
25 comprises proteolytically cleaving a BACE protein which lacks one of more proteolytic 

cleavage sites as described above, the cleavage desirably occurring at (and including) one of 
position 42, 45, 55, 56 or 57, preferably 42, 56 or 57. Clostripain, or another protease 
which recognises the same cleavage site as clostripain, may be used. 
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Thus the resulting BACE protein of this aspect of invention will be a protein whose N- 
terminal corresponds to 45, 48, 58, 59 or 60 of SEQ ID NO:2, and whose C-terminal region 
extends to and includes at least 398 of SEQ ID NO:2. Preferably the C-terminal region 
terminates at a residue between a point corresponding to and including 398 up to and 
5 including 455. This BACE protein may additionally comprise a C-terminal tag, such as a 
tag comprising from 5 to 15 residues, such as a his tag or the like. 

In another aspect of the invention there is provided a process for producing refolded 
recombinant BACE protein comprising the steps of: (a) solubilising the recombinant 
BACE; (b) diluting the solubilised BACE into an aqueous buffer containing sulfobetaine 
10 (for example at a concentration of 10 to 50 mM, for example 10 mM); and (c) maintaining 
the diluted solution at low temperature (for example, 3 to 6°C) and at high pH (e.g. 9 to 
10.5) for at least 2 weeks (typically 3 weeks, more typically 4 weeks). 

In another aspect the invention provides a process for producing a crystal of BACE 
comprising the step of growing the crystal by vapour diffusion using a reservoir buffer that 
15 contains 18-26 % PEG 5000 MME (for example, 20-24 % PEG 5000 MME, e.g. 20-22.5 % 
PEG 5000 MME), 180-220 mM (e.g. 200 mM) ammonium iodide and 180-22- mM (e.g. 
200 mM) tri-sodium citrate (pH 6.4-6.6). In a further aspect the reservoir buffer may 
additionally comprise from 0 to 5% (v/v) glycerol, for example 2.5% v/v. 

In another aspect the invention provides various BACE crystals, including a crystal of 
20 BACE having a hexagonal space group P6j22 (and optionally having unit cell dimensions 
of a=b=103.2 A, c=169.1 A, <x=p=60°, y=120°, and a unit cell variability of 5% in all 
dimensions); a crystal of BACE having a resolution better than 3 A (for example, better 
than 2.5 A, e.g. better than 1.8 A), and a crystal of BACE comprising a structure defined by 
all or a portion of the co-ordinates of Table 1 . 

25 In another aspect the invention provides a three-dimensional representation of BACE or of a 
portion of BACE, which representation comprises all or a portion of the coordinates of 
Table 1 . The representation is preferably a BACE model. 

The invention also contemplates a three-dimensional representation of a compound which 
fits the BACE model of the invention. 
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The invention also contemplates a computer-based method for the analysis of the interaction 
of a molecular structure with a BACE structure of the invention, which comprises: (a) 
providing a BACE model; (b) providing a molecular structure to be fitted to said BACE 
model; and (c) fitting the molecular structure to the BACE model to produce a compound 
5 model. 

In another aspect the invention provides a computer-based method for the analysis of the 
interaction of a molecular structure with a BACE structure of the invention, which 
comprises: (a) providing the structure of a BACE as defined by the coordinates of Table 1 ; 
(b) providing a molecular structure to be fitted to said BACE structure; and (c) fitting the 
10 molecular structure to the BACE structure of Table 1 . 

In another aspect the invention provides a computer-based method for the analysis of 
molecular structures which comprises: (a) providing the coordinates of at least two atoms of 
a BACE structure as defined in Table 1 ("selected coordinates"); (b) providing the structure 
of a molecular structure to be fitted to the selected coordinates; and (c) fitting the structure 
15 to the selected coordinates of the BACE structure. 

In another aspect the invention provides a computer-based method of rational drug design 
comprising comprising: (a) providing the coordinates of at least two atoms of a BACE 
structure as defined in Table 1 ("selected coordinates"); (b) providing the structures of a 
plurality of molecular fragments; (c) fitting the structure of each of the molecular fragments 
20 to the selected coordinates; and (d) assembling the molecular fragments into a single 
molecule to form a candidate modulator molecule. 

In another aspect the invention provides a method for identifying a candidate modulator 
(e.g. candidate inhibitor) of BACE comprising the steps of: (a) employing a three- 
dimensional structure of BACE, at least one sub-domain thereof, or a plurality of atoms 
25 thereof, to characterise at least one BACE binding cavity, the three-dimensional structure 
being defined by atomic coordinate data according to Table 1; and (b) identifying the 
candidate modulator by designing or selecting a compound for interaction with the binding 
cavity. 
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In another aspect the invention provides a method for identifying an agent compound (e.g. 
an inhibitor) which modulates BACE activity, comprising the steps of: (a) employing three- 
dimensional atomic coordinate data according to Table 1 to characterise at least one (e.g. a 
plurality of) BACE binding site(s); (b) providing the structure of a candidate agent 
5 compound; (c) fitting the candidate agent compound to the binding sites; and (d) selecting 
the candidate agent compound. 

In another aspect the invention provides a method of assessing the ability of a candidate 
v modulator to interact with BACE which comprises the steps of: (a) obtaining or 

synthesising said candidate modulator; (b) forming a crystallized complex of BACE and 
10 said candidate modulator; and (c) analysing said complex by X-ray crystallography or NMR 
spectroscopy to determine the ability of said candidate modulator to interact with BACE. 

In another aspect the invention provides a method for determining the structure of a 
compound bound to BACE, said method comprising: (a) mixing BACE with the compound 
to form a BACE-compound complex; (b) crystallizing the BACE-compound complex; and 
15 (c) determining the structure of said BACE-compound(s) complex by reference to the data 
of Table 1. 

In another aspect the invention provides a method for determining the structure of a 
compound bound to BACE, said method comprising: (a) providing a crystal of BACE; (b) 
soaking the crystal with one or more compound(s) to form a complex; and (c) determining 
20 the structure of the complex by employing the data of Table 1 . 

In another aspect the invention provides a method of determining the three dimensional 
structure of a BACE homologue or analogue of unknown structure, the method comprising 
the steps of: (a) aligning a representation of an amino acid sequence of the BACE 
homologue or analogue with the amino acid sequence of the BACE of Table 1 to match 
25 homologous regions of the amino acid sequences; (b) modelling the structure of the 
matched homologous regions of said target BACE of unknown structure on the 
corresponding regions of the BACE structure as defined by Table 1 ; and (c) determining a 
conformation for the BACE homologue or analogue which substantially preserves the 
structure of said matched homologous regions. 
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In another aspect the invention provides a method of providing data for generating 
structures and/or performing rational drug design for BACE, BACE homologues or 
analogues, complexes of BACE with a potential modulator, or complexes of BACE 
homologues or analogues with potential modulators, the method comprising: (i) establishing 
5 communication with a remote device containing computer-readable data comprising at least 
one of: (a) atomic coordinate data according to Table 1, said data defining the three- 
dimensional structure of BACE, at least one sub-domain of the three-dimensional structure 
of BACE, or the coordinates of a plurality of atoms of BACE; (b) structure factor data for 
BACE, said structure factor data being derivable from the atomic coordinate data of Table 
10 1 ; (c) atomic coordinate data of a target BACE homologue or analogue generated by 

homology modelling of the target based on the data of Table 1 ; (d) atomic coordinate data 
of a protein generated by interpreting X-ray crystallographic data or NMR data by reference 
to the data of Table 1; and (e) structure factor data derivable from the atomic coordinate 
data of (c) or (d); and (ii) receiving said computer-readable data from said remote device. 

15 In another aspect the invention provides a computer system containing one or more of: (a) 
atomic coordinate data according to Table 1 , said data defining the three-dimensional 
structure of BACE or at least selected coordinates thereof; (b) structure factor data (where a 
structure factor comprises the amplitude and phase of the diffracted wave) for BACE, said 
structure factor data being derivable from the atomic coordinate data of Table 1; (c) atomic 

20 coordinate data of a target BACE protein generated by homology modelling of the target 
based on the data of Table 1; (d) atomic coordinate data of a target BACE protein generated 
by interpreting X-ray crystallographic data or NMR data by reference to the data of Table 1 ; 
or (e) structure factor data derivable from the atomic coordinate data of (c) or (d). 

In another aspect the invention provides a computer-readable storage medium, comprising a 
25 data storage material encoded with computer readable data, wherein the data are defined by 
all or a portion of the structure coordinates of BACE of Table 1, or a homologue of BACE, 
wherein said homologue comprises backbone atoms that have a root mean square deviation 
from the Ca or backbone atoms (nitrogen-carbon^-carbon) of Table 1 of less than 2.0 A, 
preferably less than 1 .5 A, more preferably less than 1.0 A, even more preferably less than 
30 0.74 A, even more preferably less than 0.72 A and most preferably less than 0.5 A when 
superimposed on the coordinates provided in Table 1 for the residue backbone atoms. 
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In another aspect the invention provides a computer-readable data storage medium 
comprising a data storage material encoded with a first set of computer-readable data 
comprising a Fourier transform of at least a portion (e.g. selected coordinates as defined 
herein) of the structural coordinates for BACE according to Table 1 ; which, when combined 
5 with a second set of machine readable data comprising an X-ray diffraction pattern of a 
molecule or molecular complex of unknown structure, using a machine programmed with 
the instructions for using said first set of data and said second set of data, can determine at 
least a portion of the structure coordinates corresponding to the second set of machine 
readable data. 

10 In another aspect the invention provides a computer readable medium with at least one of: 
(a) atomic coordinate data according to Table 1 recorded thereon, said data defining the 
three-dimensional structure of BACE, or at least selected coordinates thereof; (b) structure 
factor data for BACE recorded thereon, the structure factor data being derivable from the 
atomic coordinate data of Table 1; (c) atomic coordinate data of a target BACE protein 

15 generated by homology modelling of the target based on the data of Table 1 ; (d) atomic 
coordinate data of a BACE-ligand complex or a BACE homologue or analogue generated 
by interpreting X-ray crystallographic data or NMR data by reference to the data of Table 1 ; 
and (e) structure factor data derivable from the atomic coordinate data of (c) or (d). 

In another aspect the invention provides a method for determining the structure of a protein, 
20 which method comprises; providing the co-ordinates of Table 1, and either (a) positioning 
the co-ordinates in the crystal unit cell of said protein so as to provide a structure for said 
protein or (b) assigning NMR spectra Peaks of said protein by manipulating the coordinates 
of Table 1. 

In another aspect the invention contemplates BACE modulator molecules, medicaments, 
25 pharmaceutical compositions and drugs obtainable by, or obtained by, the processes and 
methods of the invention, and to methods of therapy (e.g. the treatment of Alzheimer's 
disease) using such products. 

It is to be understood that, except where explicitly stated otherwise, references herein to 
"BACE protein" or "BACE peptide", "mutant BACE protein" or "mutant BACE peptide" 
30 and to "BACE protein" or "BACE peptide", as well as references to any of the foregoing 



00139124 



PATENT 
674553-2002.1 

15 

which are further defined inter alia by reference to one or more specific amino acid 
sequences, are intended to cover BACE homologies, allelic forms, species variants, 
derivatives and muteins thereof (as defined below). 

Thus, references to mutant BACE proteins having particular amino acid sequences may 
5 optionally be interpreted to cover the corresponding homologues, allelic forms, species 
variants, derivatives and muteins (as defined below) of that particular BACE amino acid 
sequence. 

Definitions 

Where used herein and unless specifically indicated otherwise, the following terms are 
10 intended to have the following meanings in addition to any broader (or narrower) meanings 
the terms might enjoy in the art: 

s 

The term "isolated" is used herein to indicate that the isolated moiety (e.g. peptide or 
nucleic acid) exists in a physical milieu distinct from that in which it occurs in nature. For 
example, the isolated peptide may be substantially isolated with respect to the complex 
1 5 cellular milieu in which it naturally occurs. The absolute level of purity is not critical, and 
those skilled in the art can readily determine appropriate levels of purity according to the 
use to which the peptide is to be put. The term "isolating" when used a step in a process is 
to be interpreted accordingly. 

20 In many circumstances, the isolated moiety will form part of a composition (for example a 
more or less crude extract containing many other molecules and substances), buffer system, 
matrix or excipient, which may for example contain other components (including proteins, 
such as albumin). 

In other circumstances, the isolated moiety may be purified to essential homogeneity, for 
25 example as determined by PAGE or column chromatography (for example HPLC or mass 
spectrometry). In preferred embodiments, the isolated peptide or nucleic acid of the 
invention is essentially the sole peptide or nucleic acid in a given composition. 

The proteins and nucleic acids of the invention need not be isolated in the sense defined 
above, however. For example, more or less crude culture supernatants (e.g. "spent" ^ 
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medium) may contain sufficient concentrations of the proteins or nucleic acids of the 
invention for use in several applications. Preferably, such supernatant s are fractionated 
and/or extracted, but in many circumstances they may be used without pretreatment. They 
are preferably derived from spent media used to culture the host cells of the invention (for 
5 example, the bacterial sources described infra). The supernatants are preferably sterile. They 
may be treated in various ways, for example by concentration, filtration, centrifugation, 
spray drying, dialysis and/or lyophilisation. Conveniently, the culture supernatants are 
simply centrifuged to remove cells/cell debris and filtered. 

The term "pharmaceutical composition" is used herein to define a solid or liquid 
10 composition in a form, concentration and level of purity suitable for administration to a 
patient (e.g. a human or animal patient) upon which administration it can elicit the desired 
physiological changes. 

The term "recombinant" as applied to the proteins of the invention is used herein to define a 
protein that has been produced by that body of techniques collectively known as 
15 "recombinant DNA technology" (for example, using the nucleic acid, vectors and or host 
cells described herein). 

The term "synthetic" as applied to the peptides of the invention is used herein to define a 
peptide that has been chemically synthesised in vitro (for example by any of the 
commercially available solid-phase peptide-synthesis systems). 

20 As used herein in relation to the vectors of the invention, the term "operably linked" refers 
to a condition in which portions of a linear nucleic acid sequence are capable of influencing 
the activity of other portions of the same linear nucleic acid sequence. For example, DNA 
for a signal peptide (secretory leader) is operably linked to DNA for a polypeptide if it is 
expressed as a precursor which participates in the secretion of the polypeptide; a promoter is 

25 operably linked to a coding sequence if it controls the transcription of the sequence; a 
ribosome binding site is operably linked to a coding sequence if it is positioned in the 
correct reading-frame so as to permit translation. 
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By "apo-structure" we mean the three-dimensional structure of the protein that contains no 
ligand, e.g. substrate or product or cofactor or inhibitor i.e. the active site of the protein is 
empty. 

In the following by "binding site" or "binding cavity" we mean a site (such as an atom, a 
5 functional group of an amino acid residue or a plurality of such atoms and/or groups) in a 
BACE binding cavity, which may bind to an agent compound such as a candidate inhibitor. 
Depending on the particular molecule in the cavity, sites may exhibit attractive or repulsive 
binding interactions, brought about by charge, steric considerations and the like. 

Binding sites are sites within a macromolecule, or on its surface, at which ligands can bind. 

10 Examples are the catalytic or active site of an enzyme (the site on an enzyme at which the 
amino acid residues involved in catalysing the enzymatic reaction are located), allosteric 
binding sites (ligand binding sites distinct from the catalytic site, but which can modulate 
enzymatic activity upon ligand binding), cofactor binding sites (sites involved in 
binding/co-ordinating cofactors e.g. metal ions), or substrate binding sites (the ligand 

15 binding sites on a protein at which the substrates for the enzymatic reaction bind). There 
are also sites of protein-protein interaction. 

In the following by "active site" we mean a site (such as an atom, a functional group of an 
amino acid residue or a plurality of such atoms and/or groups) in a BACE binding cavity, 
which is involved in catalysis. 

20 By "fitting", is meant determining by automatic, or semi-automatic means, interactions 
between one or more atoms of a candidate molecule and at least one atom of a BACE 
structure of the invention, and calculating the extent to which such interactions are stable. 
Interactions include attraction and repulsion, brought about by charge, steric considerations 
and the like. Various computer-based methods for fitting are described further herein. 

25 By "root mean square deviation" we mean the square root of the arithmetic mean of the 
squares of the deviations from the mean. 

By a "computer system" we mean the hardware means, software means and data storage 
means used to analyse atomic coordinate data. The minimum hardware means of the 
computer-based systems of the present invention typically comprises a central processing 
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unit (CPU)* input means, output means and data storage means. Desirably a monitor is 
provided to visualise structure data. The data storage means may be RAM or means for 
accessing computer readable media of the invention. Examples of such systems are 
microcomputer workstations available from Silicon Graphics Incorporated and Sun 
5 Microsystems running Unix based, Windows NT or IBM OS/2 operating systems. 

By "computer readable media" we mean any medium or media, which can be read and 
accessed directly by a computer e.g. so that the media is suitable for use in the above- 
mentioned computer system. Such media include, but are not limited to: magnetic storage 
media such as floppy discs, hard disc storage medium and magnetic tape; optical storage 
10 media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; 
and hybrids of these categories such as magnetic/optical storage media. 

Thle term "homologue" is used herein in two distinct senses. It is used sensu stricto to define 
proteins that share a common ancestor. In this sense it covers orthologues (species variants 
which have diverged in different organisms following a speciation event) and paralogies 
15, (variants which have diverged within the same organism after a gene duplication event). 
Thus, there is a direct evolutionary relationship between such homologues and this may be 
reflected in structural and/or functional similarities. For example, orthologues may perform 
the same role in each organism in which they are found, while paralogues may perform 
functionally related (but distinct) roles within the same organism. 

20 The term is also used herein sensu lato to define proteins which are to some extent 
structurally similar (i.e. not necessarily evolutionary related and/or structurally and 
functionally equivalent). In this sense, homology is recognised on the basis of purely 
structural criteria by the presence of amino acid sequence identities and/or conservative 
amino acid changes and/or similar secondary, tertiary or quaternary structures. 

25 The term "analogue" is used herein to define proteins with similar functions and/or 

structures and which are not necessarily evolutionary related. Protein analogues which 
share function but which have no or little structural similarities are likely to have arisen by 
convergent evolution. Conversely, protein analogues which share structural similarities but 
which exhibit few or no functional similarities are likely to have arisen by divergent 

30 evolution. Protein analogues may be identified, for example, by screening a library of 
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proteins to detect those with similar function(s) but different physical properties, or by 
screening for proteins which share structural features but not necessarily any functions (e.g. 
by immunological screening). 

The term "equivalent" is used herein to define those protein analogues which exhibit 
5 substantially the same function(s) and which share at least some structural features (e.g. 
functional domains), but which have not evolved from a common ancestor. Such 
equivalents are typically synthetic proteins (see below) and may be generated, for example, 
by identifying sequences of functional importance (e.g. by identifying conserved or 
canonical sequences, functional domains or by mutagenesis followed by functional assay), 
10 selecting an amino acid sequence on that basis and then synthesising a peptide based on the 
selected amino acid sequence. Such synthesis can be achieved by any of many different 
methods known in the art, including solid phase peptide synthesis (to generate synthetic 
peptides) and the assembly (and subsequent cloning) of oligonucleotides. Some synthetic 
protein analogues may be chimaeras (see below), and such equivalents can be designed and 
15 assembled for example by concatenation of two or more different structural and/or 

functional peptide domains from different proteins using recombinant DNA techniques (see 
below). 

The BACE protein homologues of the invention therefore include proteins and peptides 
having at least 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity with the 
20 reference protein, and include truncated forms of the BACE proteins of the invention. Such 
truncates are preferably at least 25%, 35%, 50% or 75% of the length of the corresponding 
specifically exemplified proteins and may have at least 60% sequence identity (more 
preferably, at least 75%, 80%, 85%, 90% , 95%, 97%, 98% or 99% sequence identity) with 
that specifically exemplified protein. 

25 Particularly preferred homologues are truncates that contain a segment preferably 

comprising at least 8, 15, 20 or 30 contiguous amino acids that share at least 75%, 80%, 
85%, 90% , 95%, 97%, 98% or 99% sequence identity with that specifically exemplified 
protein, 

A "conservative amino acid change" is one in which the amino acid residue is replaced with 
30 an amino acid residue having a similar side chain. Families of amino acid residues having 
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similar side chains have been defined in the art. These families include amino acids with 
basic side chains (e.g. lysine, arginine and histidine), acidic side chains (e.g. aspartic acid 
and glutamic acid), non-charged polar side chains (e.g. glycine, asparagine, glutamine, 
serine, threonine, tyrosine and cysteine), non-polar side chains (e.g. alanine, valine, leucine, 
5 isoleucine, proline, phenylalanine, methionine and tryptophan), beta-branched side chains 
(e.g. threonine, valine and isoleucine), and aromatic side chains (e.g. tyrosine, 
phenylalanine, tryptophan and histidine). 

Thus, references herein to proteins and peptides that are to some defined extent "identical" 
(or which share a defined extent of "identity") with a reference protein or peptide may also 
10 optionally be interpreted to include proteins and peptides in which conservative amino acid 
changes are disregarded so that the original amino acid and its changed counterpart are 
regarded as identical for the purposes of sequence comparisons. 

r 

The term "allelic form" is used herein to define a naturally-occurring alternative forms of 
the sequence present in the BACE protein which reflect naturally-occurring differences in 

15 the BACE gene pool. Preferably, allelic variants of the proteins of the invention have at 
least 60% sequence identity (more preferably, at least 75%, 80%, 85%, 90% or 95% 
sequence identity) with the corresponding specifically exemplified BACE protein, where 
sequence identity is determined by comparing the nucleotide sequences of the 
polynucleotides when aligned so as to maximize overlap and identity while minimizing 

20 sequence gaps. 

The term "species variant" (or orthologue) is used herein to define the corresponding 
protein from a different organism. Thus, species variants share a direct evolutionary 
relationship. 

25 The term "derivative" as applied herein to the BACE proteins of the invention is used to 
define proteins which are modified versions of the specifically exemplified proteins of the 
invention. Such derivatives may include fusion proteins, in which the proteins of the 
invention have been fused to one or more different proteins, peptides or amino acid tags (for 
example an antibody or a protein domain conferring a biochemical activity, to act as a label, 

30 or to facilitate purification). Particularly preferred are derivatives in which the peptides are 
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modified by a polyHis (6xHis) tag to facilitate purification of the peptide derivative on Ni 
agarose beads. 

The derivatives may also be products of synthetic processes that use a peptide of the 
invention as a starting material or reactant. 

5 The term "mutein" is used herein to define proteins that are mutant forms of the BACE 
proteins of the invention, i.e. proteins in which one or more amino acids have been added, 
altered, deleted, replaced, inserted or substituted. Thus, the terms "BACE mutein" and 
"mutant BACE protein" are used interchangeably herein. The muteins/mutant BACE 
proteins of the invention therefore include fragments, truncates and fusion proteins and 
10 peptides (e.g. comprising fused immunoglobulin, receptor, tag, label or enzyme moieties). 

The muteins of the invention therefore include truncated forms of the BACE proteins of the 
invention. Such truncates are preferably least 25%, 35%, 50% or 75% of the length of the 
corresponding specifically exemplified BACE protein and may have at least 60% sequence 
identity (more preferably, at least 75%, 80%, 85%, 90% or 95% sequence identity) with that 
1 5 specifically exemplified protein. 

Particularly preferred are truncates that contain a segment preferably comprising at least 8, 
15, 20 or 30 contiguous amino acids that share at least 75%, 80%, 85%, 90% or 95% 
sequence identity with that specifically exemplified protein. 

For the purposes of the present invention, sequence identity is determined by comparing the 
20 amino acid sequences of the proteins when aligned so as to maximize overlap and identity 
while minimizing sequence gaps. In particular, sequence identity may be determined using 
any of a number of mathematical algorithms. A nonlimiting example of a mathematical 
algorithm used for comparison of two sequences is the algorithm of Karlin and Altschul 
(1990) Proc. Natl. Acad. Sci. USA 87: 2264-2268, modified as in Karlin and Altschul 
25 (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877. 

Another example of a mathematical algorithm used for comparison of sequences is the 
algorithm of Myers and Miller (1988) CABIOS 4: 11-17. Such an algorithm is incorporated 
into the ALIGN program (version 2.0) which is part of the GCG sequence alignment 
software package. When utilizing the ALIGN program for comparing amino acid 
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sequences, a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 
4 can be used. Yet another useful algorithm for identifying regions of local sequence 
similarity and alignment is the FASTA algorithm as described in Pearson and Lipman 
(1988) Proc. Natl. Acad. Sci. USA 85: 2444-2448. 

5 Preferred for use according to the present invention is the WU-BLAST (Washington 

University BLAST) version 2.0 software. WU-BLAST version 2.0 executable programs for 
several UNIX platforms can be downloaded from ftp ://blast. wustl. edu/blast/executables. 
This program is based on WU-BLAST version 1 .4, which in turn is based on the public 
domain NCBI-BLAST version 1.4 (Altschul and Gish, 1996, Local alignment statistics, 

10 Doolittle ed., Methods in Enzymology 266: 460-480; Altschul et al., 1990, Basic local 
alignment search tool, Journal of Molecular Biology 215: 403-410; Gish and States, 1993, 
Identification of protein coding regions by database similarity search, Nature Genetics 3: 
266-272; Karlin and Altschul, 1993, Applications and statistics for multiple high-scoring 
segments in molecular sequences, Proc. Natl. Acad. Sci. USA 90: 5873-5877; all of which 

15 are incorporated by reference herein). 

In all search programs in the suite the gapped alignment routines are integral to the database 
search itself. Gapping can be turned off if desired. The default penalty (Q) for a gap of 
length one is Q=9 for proteins and BLASTP, and Q=10 for BLASTN, but may be changed 
to any integer. The default per-residue penalty for extending a gap (R) is R=2 for proteins 
20 and BLASTP, and R=10 for BLASTN, but may be changed to any integer. Any 
combination of values for Q and R can be used in order to align sequences so as to 
maximize overlap and identity while minimizing sequence gaps. The default amino acid 
comparison matrix is BLOSUM62, but other amino acid comparison matrices such as PAM 
can be utilized. 

25 

The muteins of the invention also include peptides in which mutations have been introduced 
which effectively promote or impair one or more activities of the protein, for example 
mutations which promote or impair the function of a receptor, a recognition sequence or an 
effector binding site. 

30 

Muteins may be produced by any convenient method. Conveniently, site-directed 
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mutagenesis with mutagenic oligonucleotides may be employed using a double stranded 
template (pBluescript KS II construct containing nucleic acid encoding the BACE protein), 
(e.g. Chameleon™ or QuikChange™ - Stratagene™) or cassette mutagenesis methods my be 
employed. After verifying each mutant derivative by sequencing, the mutated gene is 
5 excised and inserted into a suitable vector so that the modified protein can be over- 
expressed and purified. 

Brief Description of the Drawings 

Table 1, provides the coordinates of the BACE structure. The numbering of the residues 
used in this Table (see Section (D) below) correspond to the numbering of used by Hong et 
10 al, ibid Elsewhere - unless indicated to the contrary - in the specification the numbering of 
the SwissProt database entry P56817 is used. Residue 1 of Table 1 corresponds to 62 of 
SwissProt P56817, and residue 385 corresponds to 446 of SwissProt P56817. In the 
sequence listing below, the SwissProt P56817 residues 14-453. are shown as 16-455 of SEQ 
ID NO:2. 

15 Figure 1 represents the packing arrangements of the BACE monomers within the P6i22 
crystal lattice. 

Figure 2 shows the superposition of BACE in complex with OM99-2 (1FKN), in black, 
with BACE, of the invention, in the absence of ligand (grey). The position of OM99-2 is 
defined by a stick representation of the inhibitor. 

20 Detailed Description of the Invention 

A. Construct design . 

BACE protease is expressed, at high levels, as insoluble inclusion bodies in bacterial cells. 
To prepare functional protein appropriate for enzyme assay and structural studies these 
inclusion bodies are solubilised using denaturants and the slow removal of these denaturants 
25 results in the formation of the correct tertiary structure. In addition BACE is expressed as a 
pro-sequence and requires activation by a protease before it is fully functional. 

One of the problems of the techniques described in the art (Tang et at) for isolation of 
BACE from inclusion bodies is the generation of a mixture of products from the 
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uncontrolled cleavage process. Choppa et al describe the isolation of BACE from 
mammalian cells and the subsequent cleavage with protease, which also gives a mixture of 
protein species. Thus there is a need in the art for a method of generating active BACE as a 
homogenous species. 

5 A further problem with the prior art techniques is the low yield of crystallisable material 
obtained. The inventors surprisingly found that the present invention results in a high yield 
from bacterial cells, in particular E. coli. 

The inventors utilized clostripain as an activating protease to perform this cleavage in a 
controlled manner but this produced multiple species of BACE, as determined by mass 
10 spectrometry. In order to obtain a uniform homogenous protein after activation, a number 
of different constructs were produced. These constructs focused on the mutation of two of 
the clostripain cleavage sites (R56 and R57). 

The sequences of the invention were designed to achieve a single cleavage point upon 
activation by clostripain, as activation of wild type sequence in this way resulted in a non- 
15 crystallisable protein with heterogeneous N termini. 

The BACE constructs of the invention contain successful modifications of the BACE 
sequence to allow generation of homogeneous protein product from the use of clostripain. 
The sequence of the invention contains substitution for another amino acid residue or 
deletion of the arginine 56 and/or arginine 57 (numbering based on wild type full length 

20 sequence, SWISSPROT P56817). In a preferred aspect of the invention this is a conserved 
substitution. Conservative amino acid substitutions are well known in the art, and include 
substitutions made on the basis of similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity and/or the amphipathic nature of the amino acid residues involved. For 
example, positively charged amino acids include lysine and arginine and histidine. In a 

25 preferred aspect the mutation introduced is substitution of arginine to lysine at position 56 
and/or 57, more preferably 56 and 57. This results in, as oppose to the wild type, the 
production of a single species of activated protein upon limited digest with clostripain. 
Clostripain cleavage occurs at a single site and is thus specific and generates a single 
species in minutes. 
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The advantage of these mutations is that they allow the controlled cleavage at arginine 
residue 42 and hence provides a single N-terminus. 

This controlled cleavage thus provides a means to produce a substantially homogeneous 
composition of a BACE protein of the invention. By substantially homogeneous, it is meant 
5 that at least 95%, preferably at least 98% and more preferably at least 99% of the BACE 
protein in the composition has the same N-terminus. The N-terminus may be selected from 
residues 43 (i.e. by cleavage at 42), 46, 56, 57 or 58, preferably from 43, 56, 57 or 58, more 
preferably 43, 56 or 57. 

These mutations can be introduced onto any sequence of BACE by site-directed 
10 mutagenesis techniques, to facilitate the generation of homogeneous material for structural 
or activity studies. Thus proteins of the invention are BACE proteins with residues 56 
and/or 57 either mutated or deleted. Proteins of the invention also include BACE mutants 
described below in section (C). 

The invention is exemplified by several constructs (SEQ ID 5-18). These were built based 
15 on the wild type sequence (BACE WT, SEQ ID 2) where R56 and/or R57 were mutated to 
K or deleted. These were BACE WTR56KR57K (SEQ ID 6), BACE WTR57K (SEQ ID 
8), BACE WT R57del (SEQ ID 10). This was also performed on the BACE construct 
BACE N->Q to give BACE N->Q R56KR57K (SEQ ID 12), BACE N->Q R57K (SEQ ID 
16), BACE N->Q R57del (SEQ ID 18). The BACE N->Q construct contains 4 additional 
20 mutations of asparagines to glutamine and a C-terminal His tag as well as the arginine 

mutations. BACE N->Q without the His tag was mutated at 56 and 57 to give BACE N->Q 
R56K R57K no His (SEQ ID 14). 

SEQ ID 19 is the activated from of SEQ ID 6, SEQ ID 21 the activated form of SEQ ID 12 
and SEQ ID 20 the activated form of SEQ ID 14, i.e. the form in which the protein is 
25 crystallized. 

The three BACE constructs BACE WT R56KR57K, BACE N->Q R56KR57K, and BACE 
N->Q R56KR57K no His gave higher expression levels. 

Thus the invention concerns any BACE proteins with one or more of: a mutation at 56, and 
mutation at 57, or a deletion at 56 or a deletion at 57, but preferably 56 and 57 mutated, and 
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crystals thereof i.e. any BACE protein comprising residues 56-396 of BACE (based on [ 
numbering of SwissProt P568 1 7) and containing these mutations. 

B. Refolding protocol 

The protein was expressed in E. coli as inclusion bodies, as outlined above. In an 
5 improvement of existing techniques BACE isolated from inclusion bodies was refolded by 
the use of high pH, a sulfobetaine refolding agent, and a longer duration at high pH. This 
refolding protocol increased the yield of refolded protein obtained and also gave high and 
reproducible yields of refolded BACE suitable for crystallisation. 

The use of high pH in refolding (Burton et el, 1989) and of sulfobetaines as solubilising 
10 molecules in folding experiments (Goldberg et al, 1996) has previously been described. 
Here we describe the use of a combination of these technologies to give an unprecedented 
high yield of BACE. In addition to this combination of high pH and sulfobetaine, in 
another deviation from existing protocols for refolding BACE, the pH is maintained at high 
pH for at least 2 weeks. This is in comparison to the method of Tang et al, where BACE is 
15 solubilised at high pH and then the pH lowered before protein recovery at least 2-3 weeks 
later, preferably 3-4 weeks later. 

Another aspect of the invention therefore concerns a novel method of producing soluble 
BACE proteins of the invention, utilizing a refolding protocol comprising the combined 
techniques of high pH buffer and the use of sulfobetaine, and also maintaining this high pH 
20 over at least two weeks. 

More specifically, a method for producing refolded recombinant BACE comprising 
refolding the BACE under conditions which denature and then slowly renature the enzyme 
into a soluble form wherein: (a) the BACE is solubilised using a chaotrope such as urea or 
guanidine at 8-10M (typically 8 M urea solution) including one or more reducing agents at a i 

25 pH of greater than 8.0 e.g. pH 9.0-10.5; (b) the BACE is then diluted into an aqueous buffer, 
like 20 mM-Tris, pH 9.0, containing sulfobetaine, preferably 10 mM sulfobetaine, where the 
sulfobetaine is preferably NDSB256 (3-(benzyldimethylammonio) propanesulfonate); (c) 
the solution is maintained at low temperature, e.g. 3-6 °C typically 4 °C, and at high pH, 
typically approximately pH 9.0, for at least 2 weeks (typically 3 weeks, more typically 4 

30 weeks) before proceeding with purification. 
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C. Protein Crystals. 

Described herein is a crystal of BACE having a hexagonal space group P6i22, and unit cell 
dimensions a=b=103.2 A, c=169.1 A , a=p=60 °, y=120°. Unit cell variability of 5% may 
be observed in all dimensions. Such crystals contain one copy of BACE in the asymmetric 
5 unit. 

Such a crystal may be obtained using the methods described in the accompanying examples. 

The crystal may be of the BACE protein of SEQ ID 1 9 although as explained earlier any 
homologue, allelic form, species variant, derivative or mutein (as hereinbefore defined) may 
be used. Thus, it will be understood by those of skill in the art that some variation to the 
10 primary amino acid sequence may be made without significant alteration to the resulting 
crystal structure. Such minor variations include the replacement of one or more amino 
acids, for example from 1 to 30, such as 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids by an 
equivalent or fewer number of amino acids. 

The methodology used to provide a BACE crystal illustrated herein may be used generally 
1 5 to provide a human BACE apo crystal resolvable at a resolution of at least 3 A. 

The invention thus further provides an apo BACE crystal having a resolution better than, i.e. 
numerically lower than, 2.5 A. 

The invention also provides a BACE crystal having a resolution better than, i.e. numerically 
lower than, 1.8 A. 

20 The invention also provides apo crystals of BACE resolvable to at least 2.5 A capable of 
being soaked with compound(s) to form co-complex structures. 

The proteins may be wild-type proteins or variants thereof, which are modified to promote 
crystal formation, for example by N-terminal truncations and/or deletion of loop regions, 
which prevent crystal formation. 

25 The methods described herein may be used to make a BACE protein crystal, particularly of 
a BACE protein of SEQ ID 19-21, which method comprises growing a crystal by vapour 

diffusion using a reservoir buffer that contains 1 8-26 % PEG 5000 MME, preferably 20-24 

f 
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% PEG 5000 MME, more preferably 20-22.5 % PEG 5000 MME, with 180-220 mM (e.g. 
200 mM) ammonium iodide and 180-220 mM (e.g. 200 mM) tri-sodium citrate (pH 6.4- 
6.6). In a preferred embodiment, this reservoir buffer may also contain from 0 to 5% 
glycerol, e.g. about 2.5% glycerol. The growing of the crystal is by vapour diffusion and is 
5 performed by placing an aliquot of the protein solution on a cover slip as a hanging drop 
above a well containing the reservoir buffer. The concentration of the protein solution used 
was approximately 7 mg/ml. 

Other crystals of the invention include crystals which have selected coordinates of the 
binding pocket, wherein the amino acid residues associated with those selected coordinates 

10 are located in a protein framework which holds these amino acids in a relative spatial 
configuration corresponding to the spatial configuration of those amino acids in Table 1. 
By "corresponding to", it is meant within an r.m.s.d. of less than 2.0 A, preferably less than 
1 .5 A, more preferably less than 1 .0 A, even more preferably less than 0.74 A, even more 
preferably less than 0.72 A and most preferably less than 0.5 A from the Caor backbone 

1 5 atoms of Table 1 , preferably the Ca atoms. 

Crystals of the invention also include crystals of B ACE mutants (muteins). In addition, 
BACE mutants may be crystallized in co-complex with known BACE substrates or 
inhibitors or novel compounds. 

As explained herein, a mutant BACE (or BACE mutein) is a BACE protein characterized by 
20 the replacement or deletion of at least one amino acid from the wild type BACE. Such a 
mutant may be prepared for example by site-specific mutagenesis, or incorporation of 
natural or unnatural amino acids. 

As explained herein, the present invention therefore contemplates BACE mutants (or 
muteins) as hereinbefore defined. 

25 For example, the BACE mutants may define a polypeptide which is obtained by replacing at 
least one amino acid residue in a native or synthetic BACE with a different amino acid 
residue and/or by adding and/or deleting amino acid residues within the native polypeptide 
or at the N- and/or C-terminus of a polypeptide corresponding to BACE, and which has 
substantially the same three-dimensional structure as BACE from which it is derived. By 
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having substantially the same three-dimensional structure is meant having a set of atomic 
structure co-ordinates that have a Toot mean square deviation (r.m.s.d.) of less than or equal 
to about 2.0 A (preferably less than 1 .5 A, more preferably less than 1 .0 A, even more 
preferably less than 0.74 A, even more preferably less than 0.72 A and most preferably less 
5 than 0.5 A) when superimposed with the atomic structure co-ordinates of the BACE from 
which the mutant is derived when at least about 50% to 100% of the C a atoms of the BACE 
are included in the superposition. A mutant may have, but need not have, enzymatic or 

catalytic activity. 

-\ 

To produce homologues or mutants, amino acids present in the said protein can be replaced 
JO by other amino acids having similar properties, for example hydrophobicity, hydrophobic 
moment, antigenicity, propensity to form or break a-helical or /3-sheet structures, and so. 
Substitutional variants of a protein are those in which at least one amino acid in the protein 
sequence has been removed and a different residue inserted in its place. Amino acid 
substitutions are typically of single residues but may be clustered depending on functional 
15 constraints e.g. at a crystal contact. Preferably amino acid substitutions will comprise 
conservative amino acid substitutions. Insertional amino acid variants are those in which 
one or more amino acids are introduced. This can be amino-terminal and/or carboxy- 
terminal fusion as well as intrasequence. Examples of amino-terminal and/or carboxy- 
terminal fusions are affinity tags, MBP tag, and epitope tags. 

20 Deletional variants are those in which one or more amino acids are removed. This can be 
amino-terminal and/or carboxy-terminal, or in an internal region (for example a loop 
region), for example to remove or shorten that region. 

Amino acid substitutions, deletions and additions that do not significantly interfere with the 
three-dimensional structure of the BACE will depend, in part, on the region of the BACE 
25 where the substitution, addition or deletion occurs. In highly variable regions of the 
molecule, non-conservative substitutions as well as conservative substitutions may be 
tolerated without significantly disrupting the three-dimensional structure of the molecule. 
In highly conserved regions, or regions containing significant secondary structure, 
conservative amino acid substitutions are preferred. 
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As explained earlier, conservative amino acid substitutions are well known in the art, and 
include substitutions made on the basis of similarity in polarity, charge, solubility, 
hydrophobicity, hydrophilicity and/or the amphipathic nature of the amino acid residues 
involved. For example, negatively charged amino acids include aspartic acid and glutamic 
5 acid; positively charged amino acids include lysine and arginine; amino acids with 

uncharged polar head groups having similar hydrophilicity values include the following: 
leucine, isoleucine, valine; glycine, alanine; asparagine, glutamine; serine, threonine; 
phenylalanine, tyrosine. Other conservative amino acid substitutions are well known in the 
art. 

,10 In some instances, it may be particularly advantageous or convenient to substitute, delete 
and/or add amino acid residues to a BACE binding pocket or catalytic residue in order to 
provide convenient cloning sites in the cDNA encoding the polypeptide, to aid in 
purification of the polypeptide, to modify compound binding etc. Such substitutions, 
deletions and/or additions which do not substantially alter the three dimensional structure of 

1 5 BACE will be apparent to those having skills in the art. 

It should be noted that the mutants (BACE muteins) contemplated herein need not exhibit 
enzymatic activity. Indeed, amino acid substitutions, additions or deletions that interfere 

, with the catalytic activity of the BACE but which do not significantly alter the three- 
dimensional structure of the catalytic region are specifically contemplated by the invention. 

20 Such crystalline polypeptides, or the atomic structure co-ordinates obtained there from, can 
be used to identify compounds that bind to the protein. 

The crystallization of such mutants and the determination of the three-dimensional 
structures by X-ray crystallography relies on the ability of the mutant proteins to yield 
crystals that diffract at high resolution. The mutant protein could then be used to obtain 
25 information on compound binding through the determination of mutant protein/ligand 

complex structures, which may be characterized using the BACE crystal structure of Table 
1. 

The mutations can be introduced by site-directed mutagenesis e.g. using a Stratagene 
QuikChange™ Site-Directed Mutagenesis Kit or cassette mutagenesis methods (see e.g. 
30 Ausubel et al., eds., Current Protocols in Molecular Biology, John Wiley & Sons, Inc., New 
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York, and Sambrook et al., Molecular Cloning: a Laboratory Manual, 2nd ed., Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, NY, (1989)). 

To the extent that the present invention relates to BACE-ligand complexes and mutant, 
homologue, allelic form, species variant, derivative, mutein and analogue proteins of 
5 BACE, crystals of such proteins may be formed. The skilled person would recognize that 
the conditions provided herein for crystallising BACE may be used to form such crystals. 
Alternatively, the skilled person would use the conditions as a basis for identifying modified 
conditions for forming the crystals. 

Thus the aspects of the invention relating to crystals of BACE, may be extended to crystals 
10 of mutant/mutein, homologue, allelic form, species variant or derivative (as defined herein). 

D. Crystal Coordinates 

In a further aspect, the invention also provides an apo crystal structure of BACE having the 
three dimensional atomic coordinates of Table 1. An advantageous feature of the structure 
defined by the atomic coordinates is that it has a high resolution of about 1 .75 A. A further 
15 advantageous aspect is the provision of an apo structure of BACE, which contains no ligand 
bound, unlike those previously described in the art. This is particularly advantageous as 
ligands can then be easily soaked into the crystal to provide co-complex data without the 
need for removal of any ligand already present, and without the need for time-consuming 
co-crystallisation experiments. 

20 The BACE structure set out in Table 1 is a monomer structure. This is the first time that a 
monomer has been observed crystallographically for this protein. 

Table 1 gives atomic coordinate data for BACE. In Table 1 the third column denotes the 
atom type, the fourth the residue type, the fifth the chain identification, the sixth the residue 
number (the atom numbering as described in Hong et al, 2000) the seventh, eighth and ninth 
25 columns are the X, Y, Z coordinates respectively of the atom in question, the tenth column 
the occupancy of the atom, the eleventh the temperature factor of the atom, the twelfth the 
chain identification, and the last, thirteenth column, the atom type. 

Each of the tables is presented in an internally consistent format. For example, in Table 1 
the coordinates of the atoms of each amino acid residue are listed such that the backbone 
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nitrogen atom is first, followed by the C-alpha backbone carbon atom, designated CA, 
followed by the carbon and oxygen of the protein backbone and finally side chain residues 
(designated according to one standard convention). Alternative file formats (e.g. such as a 
format consistent with that of the EBI Macromolecular Structure Database (Hinxton, UK)) 
5 which may include a different ordering of these atoms, or a different designation of the side- 
chain residues, may be used or preferred by others of skill in the art. However it will be 
apparent that the use of a different file format to present or manipulate the coordinates of 
the Tables is within the scope of the present invention. 

The coordinates of Table 1 provide a measure of atomic location in Angstroms, to 3 decimal 
,10 places. The coordinates are a relative set of positions that define a shape in three 
dimensions, but the skilled person would understand that an entirely different set of 
coordinates having a different origin and/or axes could define a similar or identical shape. 
Furthermore, the skilled person would understand that varying the relative atomic positions 
of the atoms of the structure so that the root mean square deviation of the residue backbone 
15 atoms (i.e. the nitrogen-carbon-carbon backbone atoms of the protein amino acid residues) 
is less than 2.0 A, preferably less than 1 .5 A, more preferably less than 1 .0 A, even more 
preferably less than 0.74 A, even more preferably less than 0.72 A and most preferably less 
than 0.5 A when superimposed on the coordinates provided iii Table 1 for the Ca atoms or 
residue backbone atoms, will generally result in a structure which is substantially the same 
20 as the structure of Table 1 in terms of both its structural characteristics and usefulness for 
structure-based analysis of BACE-interactivity molecular structures. 

Likewise the skilled person would understand that changing the number and/or positions of 
the water molecules and/or substrate molecules of Table 1 will not generally affect the 
usefulness of the structure for structure-based analysis of BACE-interacting structure. Thus 

25 for the purposes described herein as being aspects of the present invention, it is within the 
scope of the invention if: the Table 1 coordinates are transposed to a different origin and/or 
axes; the relative atomic positions of the atoms of the structure are varied so that the root 
mean square deviation of residue backbone atoms is less than 2.0 A, preferably less than L5 
A, more preferably less than 1.0 A, even more preferably less than 0.74 A, even more 

30 preferably less than 0.72 A, and most preferably less than 0.5 A when superimposed on the 
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coordinates provided in Table 1 for the Cof or residue backbone atoms; and/or the number 
and/or positions of water molecules and/or substrate molecules is varied. 

Reference herein to the coordinate data of Table 1 and the like thus includes the coordinate 
data in which one or more individual values of the Table are varied in this way unless 
5 specified explicitly to the contrary. In a preferred aspect, reference herein to the coordinates 
of Table 1 or parts thereof (e.g. selected coordinates) should be taken to include coordinates 
having a root mean square deviation of less than 0.72 A, and preferably less than 0.5 A, 
from the Ca atoms of Table 1 or corresponding parts thereof. 

By "root mean square deviation" we mean the square root of the arithmetic mean of the 
1 0 squares of the deviations from the mean. 

Protein structure similarity is routinely expressed and measured by the root mean square 
deviation (r.m.s.d.), which measures the difference in positioning in space between two sets 
of atoms. The r.m.s.d. measures distance between equivalent atoms after their optimal 
superposition. The r.m.s.d. can be calculated over all atoms, over residue backbone atoms 
15 (i.e. the nitrogen-carbon-carbon backbone atoms of the protein amino acid residues), main 
chain atoms only (i.e. the nitrogen-carbon-oxygen-carbon backbone atoms of the protein 
amino acid residues), side chain atoms only or more usually over C-alpha atoms only. For 

t 

the purposes of this invention, the r.m.s.d. can be calculated over any of these, using any of 
the methods outlined below. 

20 Methods of comparing protein structures are discussed in Methods of Enzymology, vol 1 1 5, 
pg 397-420. The necessary least-squares algebra to calculate r.m.s.d. has been given by 
Rossman and Argos (J. Biol. Chem. , vol 250, pp7525 (1975)) although faster methods have 
been described by Kabsch (Acta Crystallogr., Section A, A92, 922 (1976); Acta Cryst. A34, 
827-828 (1978)), Hendrickson (Acta Crystallogr., Section A, A35, 158 (1979) and 

25 McLachan (J. Mol. Biol., vol 128, pp49 (1979). Some algorithms use an iterative procedure 
in which the one molecule is moved relative to the other, such as that described by Ferro 
and Hermans (Ferro and Hermans, Acta Crystallographic, A33, 345-347 (1977)). Other 
methods e.g. Kabsch 's algorithm locate the best fit directly. 
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It is usual to consider C-alpha atoms and the rmsd can then be calculated using programs 
such as LSQKAB (Collaborative Computational Project 4. The CCP4 Suite: Programs for 
Protein Crystallography, Acta Crystallographies D50, (1994), 760-763), MNYFIT (part of 
a collection of programs called COMPOSER, Sutcliffe, M.J., Haneef, I., Carney, D. and 

5 Blundell, T.L. (1987) Protein Engineering, 1, 377-384), MAPS (Lu, G. An Approach for 
Multiple Alignment of Protein Structures (1998, in manuscript)), QUANTA (Jones et al., 
Acta Crystallography A47 (1991), 1 10-1 19 and commercially available from Accelerys, 
San Diego, CA), Insight (commercially available from Accelerys, San Diego, CA), Sybyl® 
(commercially available from Tripos, Inc., St Louis), O (Jones et al., Acta 

10 Crystallographica, A47, (1991), 1 10-119), and other coordinate fitting programs. 

In, for example the programs LSQKAB and O, the user can define the residues in the two 
proteins that are to be paired for the purpose of the calculation. Alternatively, the pairing of 
residues can be determined by generating a sequence alignment of the two proteins, 
programs for sequence alignment are discussed in more detail in Section G. The atomic 

15 coordinates can then be superimposed according to this alignment and an r.m.s.d. value 

calculated. The program Sequoia (CM. Bruns, I. Hubatsch, M. Ridderstrom, B. Manneryik, 
and J. A. Tainer (1999) Human Glutathione Transferase A4-4 Crystal Structures and 
Mutagenesis Reveal the Basis of High Catalytic Efficiency with Toxic Lipid Peroxidation 
Products, Journal of Molecular Biology 288(3): 427-439) performs the alignment of 

20 homologous protein sequences, and the superposition of homologous protein atomic 

coordinates. Once aligned, the r.m.s.d. can be calculated using programs detailed above. For 
sequence identical, or highly identical, the structural alignment of proteins can be done 
manually or automatically as outlined above. Another approach would be to generate a 
superposition of protein atomic coordinates without considering the sequence. 

25 It is more normal when comparing significantly different sets of coordinates to calculate the 
r.m.s.d. value over C-alpha atoms only. It is particularly useful when analysing side chain 
movement to calculate the r.m.s.d. over all atoms and this can be done using LSQKAB and 
other programs. 

Varying the atomic positions of the atoms of the structure by up to about 0.5 A in a 
30 concerted way, preferably up to about 0.3 A in any direction will result in a structure which 
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is substantially the same as the structure of Table 1 in terms of both its structural 
characteristics and utility e.g. for molecular structure-based analysis. 

Also, modifications in the BACE crystal structure due to e.g. mutations, additions, 
substitutions, and/or deletions of amino acid residues (including the deletion of one or more 
5 BACE protomers) could account for variations in the BACE atomic coordinates. However, 
atomic coordinate data of BACE modified so that a ligand that bound to one or more 
binding sites of BACE would be expected to bind to the corresponding binding sites of the 
modified BACE are, for the purposes described herein as being aspects of the present 
invention, also within the scope of the invention. Reference herein to the coordinates of 
10 Table 1 thus includes the coordinates modified in this way. Preferably, the modified 
coordinate data define at least one BACE binding cavity. 

Those of skill in the art will appreciate that in many applications of the invention, it is not 
necessary to utilise all the coordinates of Table 1, but merely a portion of them. The term 
portion is intended to define a sub-set of the coordinates, which may or may not represent 

15 contiguous amino acid residues in the BACE structure. For example, as described below, in 
methods of modelling candidate compounds with BACE, selected coordinates of BACE 
may be used, for example at least 5, preferably at least 10, more preferably at least 50 and 
even more preferably at least 100 atoms of the BACE structure. Likewise, the other 
applications of the invention described herein, including homology modelling and structure 

20 solution, and data storage and computer assisted manipulation of the coordinates, may also 
utilise all or a portion of the coordinates of Table 1 . 

E. Homology Modelling 

The invention also provides a means for homology modelling of other proteins (referred to 
below as target BACE proteins). By "homology modelling", it is meant the prediction of 
25 related BACE structures based either on X-ray crystallographic data or computer-assisted de 
novo prediction of structure, based upon manipulation of the coordinate data of Table 1. 

"Homology modelling" extends to target BACE proteins, which are analogues or 
homologues of the BACE protein whose structure has been determined in the 
accompanying examples. It also extends to BACE protein mutants of BACE protein itself. 
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The term "homologous regions" describes amino acid residues in two sequences that are 
identical or have similar (e.g. aliphatic, aromatic, polar, negatively charged, or positively 
charged) side-chain chemical groups. Identical and similar residues in homologous regions 
are sometimes described as being respectively "invariant" and "conserved" by those skilled 
5 in the art. 

In general, the method involves comparing the amino acid sequences of the BACE protein 
of Table 1 with a target BACE protein by aligning the amino acid sequences (Dunbrack et 
al., Folding and Design, 2, (1997), 27-42). Amino acids in the sequences are then 
compared and groups of amino acids that are homologous (conveniently referred to as 
10 "corresponding regions") are grouped together. This method detects conserved regions of 
the polypeptides and accounts for amino acid insertions or deletions. 

Homology between amino acid sequences can be determined using commercially available 
algorithms. The programs BLAST, gapped BLAST, BLASTN, PSI-BLAST and BLAST 2 
sequences (provided by the National Center for Biotechnology Information) are widely used 
15 in the art for this purpose, and can align homologous regions of two amino acid sequences. 
These may be used with default parameters to determine the degree of homology between 
the amino acid sequence of the Table 1 protein and other target BACE proteins, which are 
to be modeled. 

Analogues are defined as proteins with similar three-dimensional structures and/or functions 
20 with little evidence of a common ancestor at a sequence level. 

Homologues are defined as proteins with evidence of a common ancestor, i.e. likely to be 
the result of evolutionary divergence and are divided into remote, medium and close sub- 
divisions based on the degree (usually expressed as a percentage) of sequence identity. 

A homologue is defined here as a protein with at least 1 5% sequence identity or which has 
25 at least one functional domain, which is characteristic of BACE. 

There are two types of homologue: orthologues and paralogues. Orthologues are defined as 
homologous genes in different organisms, i.e. the genes share a common ancestor 
coincident with the speciation event that generated them. Paralogues are defined as 
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homologous genes in the same organism derived from a gene/chromosome/ genome 
duplication, i.e. the common ancestor of the genes occurred since the last speciation event. 

The homologues could also be mutants as described in section (C). 

Once the amino acid sequences of the polypeptides with known and unknown structures are 
5 aligned, the structures of the conserved amino acids in a computer representation of the 
polypeptide with known structure are transferred to the corresponding amino acids of the 
polypeptide whose structure is unknown. For example, a tyrosine in the amino acid 
sequence of known structure may be replaced by a phenylalanine, the corresponding 
homologous amino acid in the amino acid sequence of unknown structure. 

10 The structures of amino acids located in non-conserved regions may be assigned manually 
by using standard peptide geometries or by molecular simulation techniques, such as 
molecular dynamics. The final step in the process is accomplished by refining the entire 
structure using molecular dynamics and/or energy minimization. 

Homology modelling as such is a technique that is well known to those skilled in the art 
15 (see e.g. Greer, Science, Vol. 228, (1985), 1055, and Blundell et ai y Euf. J. Biochem, Vol 
172, (1988), 513). The techniques described in these references, as well as other homology 
modelling techniques, generally available in the art, may be used in performing the present 
invention. 

Thus the invention provides a method of homology modelling comprising the steps of: (a) 
20 aligning a representation of an amino acid sequence of a target BACE protein of unknown 
three-dimensional structure with the amino acid sequence of the BACE of Table 1 to match 
homologous regions of the amino acid sequences; (b) modelling the structure of the 
matched homologous regions of said target BACE of unknown structure on the 
corresponding regions of the BACE structure as defined by Table 1 ; and (c) determining a 
25 conformation (e.g. so that favorable interactions are formed within the target BACE of 
unknown structure and/or so that a low energy conformation is formed) for said target 
BACE of unknown structure which substantially preserves the structure of said matched 
homologous regions. : 

Preferably one or all of steps (a) to (c) are performed by computer modelling. 
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The aspects of the invention described herein which utilise the BACE structure in silico 
may be equally applied to homolpgue models of BACE obtained by the above aspect of the 
invention, and this application forms a further aspect of the present invention. Thus having 
determined a conformation of a BACE by the method described above, such a conformation 
5 may be used in a computer-based method of rational drug design as described herein. 

The absence of a ligand from our structure is particularly advantageous for modelling of 
other proteins as this structure reveals the native structure of the protein unaffected by 
conformational changes upon ligand binding. 

F. Structure Solution 

10 The structure of the human BACE can also be used to solve the crystal structure of other 

target BACE proteins including other crystal forms of BACE, mutants, and co-complexes of 
BACE, where X-ray diffraction data or NMR spectroscopic data of these target BACE 
proteins has been generated and requires interpretation in order to provide a structure. 

In the case of BACE, this protein may crystallize in more than one crystal form. The 
15 structure coordinates of BACE, or portions thereof, as provided by this invention are 

particularly useful to solve the structure of those other crystal forms of BACE. They may 
also be used to solve the structure of BACE mutants, BACE co-complexes, or of the 
crystalline form of any other protein with significant amino acid sequence homology to any 
functional domain of BACE. 

20 In the case of other target BACE proteins, particularly the BACE proteins referred to in 
Section C above, the present invention allows the structures of such targets to be obtained 
more readily where raw X-ray diffraction data is generated. 

Thus, where X-ray crystallographic or NMR spectroscopic data is provided for target 
BACE-ligand complex, or a BACE homologue or analogue of unknown three-dimensional 
25 structure, the structure of BACE, as defined by Table 1 , may be used to interpret that data to 
provide a likely structure for the other BACE by techniques which are well known in the 
art, e.g. phasing in the case of X-ray crystallography and assisting peak assignments in 
NMR spectra. 
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One method that may be employed for these purposes is molecular replacement. In this 
method, the unknown crystal structure, whether it is another crystal form of BACE, a BACE 
mutant, or a BACE co-complex, or the crystal of a target BACE protein with amino acid 
sequence homology to any functional domain of BACE, may be determined using the 
5 BACE structure coordinates of this invention as provided herein. This method will provide 
an accurate structural form for the unknown crystal more quickly and efficiently than 
attempting to determine such information ab initio. 

Examples of computer programs known in the art for performing molecular replacement are 
CNX (Brunger A.T.; Adams P.D.; Rice L.M., Current Opinion in Structural Biology, 
10 Volume 8, Issue 5, October 1998, Pages 606-61 1 (also commercially available from 

Accelerys San Diego, CA) or AMORE (Navaza, J. (1994). AMoRe: an automated package 
for molecular replacement. Acta Cryst A50, 157-163). 

Thus, in a further aspect of the invention provides a method for determining the structure of 
a protein, which method comprises; providing the co-ordinates of Table 1, and either (a) 
15 positioning the co-ordinates in the crystal unit cell of said protein so as to provide a 
structure for said protein or (b) assigning NMR spectra Peaks of said protein by 
manipulating the coordinates of Table 1 . 

In a preferred aspect of this invention the co-ordinates are used to solve the structure of 
target BACE particularly homologues of BACE for example aspartic proteases such as 

20 BACE2 or cathepsin E (69% and 37% similarity, respectively). 

■ ■ 

G. Computer Systems 

In another aspect, the present invention provides systems, particularly a computer system, 
the systems containing either (a) atomic coordinate data according to Table 1, said data 
defining the three-dimensional structure of BACE or at least selected coordinates thereof; 
25 (b) structure factor data (where a structure factor comprises the amplitude and phase of the 
diffracted wave) for BACE, said structure factor data being derivable from the atomic 
coordinate data of Table 1 ; (c) atomic coordinate data of a target BACE protein generated 
by homology modelling of the target based on the data of Table 1; (d) atomic coordinate 
data of a target BACE protein generated by interpreting X-ray crystallographic data or 
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NMR data by reference to the data of Table 1 ; or (e) structure factor data derivable from the 
atomic coordinate data of (c) or (d). 

For example the computer system may comprise: (i) a computer-readable data storage 
medium comprising data storage material encoded with the computer-readable data; (ii) a 
5 working memory for storing instructions for processing said computer-readable data; and 
(iii) a central-processing unit coupled to said working memory and to said computer- 
readable data storage medium for processing said computer-readable data and thereby 
generating structures and/or performing rational drug design. The computer system may 
further comprise a display coupled to said central-processing unit for displaying said 
.10 structures. 

The invention also provides such systems containing atomic coordinate data of target BACE 
proteins wherein such data has been generated according to the methods of the invention 
described herein based on the starting data provided by Table 1 . 

Such data is useful for a number of purposes, including the generation of structures to 
1 5 analyze the mechanisms of action of BACE proteins and/or to perform rational drug design 
of compounds which interact with BACE, such as compounds which are inhibitors of 
BACE. 

In another aspect, the invention provides a computer-readable storage medium, comprising 
a data storage material encoded with computer readable data, wherein the data are defined 

20 by all or a portion (e.g. selected coordinates as defined herein) of the structure coordinates 
of BACE of Table 1 , or a homologue of BACE, wherein said homologue comprises 
backbone atoms that have a root mean square deviation from the Caor backbone atoms 
(nitrogen-carboricrcarbon) of Table 1 of less than 2 A, such as not more than 1.5 A, 
preferably less than 1.5 A, more preferably less than 1.0 A, even more preferably less than 

25 0.74 A, even more preferably less than 0.72 A and most preferably less than 0.5 A. 

The invention also provides a computer-readable data storage medium comprising a data 
storage material encoded with a first set of computer-readable data comprising a Fourier 
transform of at least a portion (e.g. selected coordinates as defined herein) of the structural 
coordinates for BACE according to Table 1; which, when combined with a second set of 
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machine readable data comprising an X-ray diffraction pattern of a molecule or molecular 
complex of unknown structure, using a machine programmed with the instructions for using 
said first set of data and said second set of data, can determine at least a portion of the 
structure coordinates corresponding to the second set of machine readable data. 

5 In a further aspect, the present invention provides computer readable media with with at 
least one of: (a) atomic coordinate data according to Table 1 recorded thereon, said data 
defining the three-dimensional structure of B ACE, or at least selected coordinates thereof; 
(b) structure factor data for BACE recorded thereon, the structure factor data being 
derivable from the atomic coordinate data of Table 1 ; (c) atomic coordinate data of a target 
10 BACE protein generated by homology modelling of the target based on the data of Table 1 ; 
(d) atomic coordinate data of a BACE-ligand complex or a BACE homologue or analogue 
generated by interpreting X-ray crystallographic data or NMR data by reference to the data 
of Table 1; and (e) structure factor data derivable from the atomic coordinate data of (c) or 

(d). 

15 By providing such computer readable media, the atomic coordinate data can be routinely 
accessed to model BACE or selected coordinates thereof. For example, RASMOL (Sayle et 
al., 7755, Vol. 20, (1995), 374) is a publicly available computer software package which 
allows access and analysis of atomic coordinate data for structure determination and/or 
rational drug design. 

20 On the other hand, structure factor data, which are derivable from atomic coordinate data 
(see e.g. Blundell et al., in Protein Crystallography, Academic Press, New York, London 
and San Francisco, (1976)), are particularly useful for calculating e.g. difference Fourier 
electron density maps. 

A further aspect of the invention provides a method of providing data for generating 
25 structures and/or performing rational drug design for BACE, BACE homologues or 
analogues, complexes of BACE with a potential modulator, or complexes of BACE 
homologues or analogues with potential modulators, the method comprising: 

(i) establishing communication with a remote device containing computer-readable data 
comprising at least one of: (a) atomic coordinate data according to Table 1, said data 
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defining the three-dimensional structure of BACE, at least one sub-domain of the three- 
dimensional structure of BACE, or the coordinates of a plurality of atoms of BACE; (b) 
structure factor data for BACE, said structure factor data being derivable from the atomic 
coordinate data of Table 1 ; (c) atomic coordinate data of a target BACE homologue or 
5 analogue generated by homology modelling of the target based on the data of Table 1; (d) 
atomic coordinate data of a protein generated by interpreting X-ray crystallographic data or 
NMR data by reference to the data of Table 1; and (e) structure factor data derivable from 
the atomic coordinate data of (c) or (d); and (ii) receiving said computer-readable data from 
said remote device. 

10 Thus the remote device may comprise e.g. a computer system or computer readable media 
of one of the previous aspects of the invention. The device may be in a different country or 
jurisdiction from where the computer-readable data is received. The communication may 
be via the internet, intranet, e-mail etc. Typically the communication will be electronic in 
nature, but some or all of the communication pathway may be optical, for example, over 

15 optical fibres. Additionally, the communication may be through radio signals or satellite 
transmissions. 

H. Uses of the Crystals of the Invention 

The crystal structures obtained according to the present invention (including the structure of 
Table 1 as well the structures of target BACE proteins obtained in accordance with the 
20 methods described herein), may be used in several ways for drug design. 

By identifying conditions under which high quality crystals of apo-BACE can be produced 
(i.e. crystals which can diffract X-rays for the determination of atomic coordinates to a 
resolution of better than 2.5 A), the present invention facilitates the identification of 
modulators of BACE activity. 

25 The invention is particularly suitable for the design, screening, development and 

optimization of BACE inhibitor components. It is thus a preferred aspect of the invention 
that modulators are inhibitors. 

In a further aspect, the invention provides a method for determining the structure of a 
compound bound to BACE, said method comprising: (a) providing a crystal of BACE 
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according to the invention; (b) soaking the crystal with said compounds; and (c) 
determining the structure of said BACE compound complex by employing the data of Table 
1. 

Alternatively, the BACE and compound may be co-crystallized. Thus the invention 
5 provides a method for determining the structure of a compound bound to BACE, said 
method comprising; mixing the protein with the compound(s), crystallizing the protein- 
compound^) complex; and determining the structure of said BACE-compound(s) complex 
by reference to the data of Table 1. 

A mixture of compounds may be soaked or co-crystallized with the crystal, wherein only 
10 one or some of the compounds may be expected to bind to the BACE. As well as the 
structure of the complex, the identity of the complexing compound(s) is/are then 
determined. 

In either case, substrate or a substrate analogue thereof may optionally be present. 

The method may comprise the further steps of: (a) obtaining or synthesising said candidate 
15 modulator; (b) forming a complex of BACE and said candidate modulator; and (c) 

analysing said complex by X-ray crystallography or NMR spectroscopy to determine the 
ability of said candidate modulator to interact with BACE. 

The analysis of such structures may employ (i) X-ray crystallographic diffraction data from 
the complex and (ii) a three-dimensional structure of BACE, or at least selected coordinates 
20 thereof, to generate a difference Fourier electron density map of the complex, the three- 
dimensional structure being defined by atomic coordinate data according to Table 1 . The 
difference Fourier electron density map may then be analyzed, to identify the binding mode 
of the modulator. 

Therefore, such complexes can be crystallized and analyzed using X-ray diffraction 
25 methods, e.g. according to the approach described by Greer et al., J. of Medicinal 

Chemistry^ Vol. 37, (1994), 1035-1054, and difference Fourier electron density maps can be 
calculated based on X-ray diffraction patterns of soaked crystals of BACE or co-crystallized 
BACE and the solved structure of uncomplexed BACE. These maps can then be analyzed 
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e.g. to determine whether and where a particular compound binds to BACE and/or changes 
the conformation of B ACE. 

Electron density maps can be calculated using programs such as those from the CCP4 
computing package (Collaborative Computational Project 4. The CCP4 Suite: Programs for 
5 Protein Crystallography, Acta Crystallographica, D50, (1994), 760-763.). For map 
visualization and model building programs such as "O" (Jones et al, Acta 
Crystallographica, A47, (1991), 1 10-119) or ''QUANTA" (1994, San Diego, CA: 
Molecular Simulations can be used. 

The crystal structures of a series of complexes may then be solved by molecular 
10 replacement and compared with that of the BACE of Table 1 . Potential sites for 

modification within the various binding sites of the enzyme may thus be identified. This 
information provides an additional tool for determining the most efficient binding 
interactions, for example, increased hydrophobic interactions, between BACE and a 
chemical entity or compound. 

15 All of the complexes referred to above may be studied using well-known X-ray diffraction 
techniques and may be refined against 1 .5 to 3 .5 A resolution X-ray data to an R value of 
about 0.30 or less using computer software, such as CNX (Brunger et al., Current Opinion 
in Structural Biology, Vol. 8, Issue 5, October 1998, 606-61 1, and commercially available 
from Accelerys, San Diego, CA), X-PLOR (Yale University, ©1992, distributed by 

20 Accelerys), as described by Blundell et al, (1976) and Methods in Enzymology, vol. 1 14 & 
115, H. W. Wyckoff et al., eds., Academic Press (1985). 

This information may thus be used to optimize known classes of BACE substrates or 
inhibitors, and more importantly, to design and synthesize novel classes of BACE 
inhibitors. 

25 Analysing the complex by X-ray crystallography will determine the ability of the candidate 
compound to interact with BACE. Analysis of the co-complexes of BACE may involve e.g. 
phasing, molecular replacement or calculating a Fourier difference map of the complex as 
discussed above. However, with the high resolutions obtainable with the crystal, it can also 
be possible to determine the ability of the candidate modulator to interact with BACE 
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merely by comparing the intensities and/or positions of X-ray diffraction spots from the 
complex with e.g. diffraction spots of uncomplexed BACE or a previously identified 
BACE-ligand complex. Thus the step of analysing the complex may involve analysing the 
intensities and/or positions of X-ray diffraction spots from the complex to determine the 
5 ability of the candidate modulator to interact with BACE. 

Having obtained and characterized a modulator compound according to the invention, the 
invention further provides a method for modulating the activity of BACE which method 
comprises: (a) providing BACE under conditions where, in the absence of modulator, the 
BACE is able to synthesize amyloid j3-peptide from amyloid precursor protein (APP); (b) 
10 providing a modulator compound; and (c) determining the extent to which the activity of 
BACE is altered by the presence of said compound. 

I. Structure-based Drug Design 

Determination of the three-dimensional structure of BACE provides important information 
about the binding sites of BACE, particularly when comparisons are made with similar 
15 enzymes. This information may then be used for rational design of BACE inhibitors, e.g. 
by computational techniques which identify possible binding ligands for the binding sites, 
by enabling linked-fragment approaches to drug design, and by enabling the identification 
and location of bound ligands using X-ray crystallographic analysis. These techniques are 
discussed in more detail below. 

20 Greer et al (1994) describes an iterative approach to ligand design based on repeated 
sequences of computer modelling, protein-ligand complex formation and X-ray 
crystallographic or NMR spectroscopic analysis. Thus novel thymidylate synthase inhibitor 
series were designed de novo by Greer et al., and BACE inhibitors may also be designed in 
the this way. More specifically, using e.g. GRID on the solved 3D structure of BACE, a 

25 ligand (e.g. a potential inhibitor) for BACE may be designed that complements the 

functionalities of the BACE binding sites. The ligand can then be synthesised, formed into 
a complex with BACE, and the complex then analysed by X-ray crystallography to identify 
the actual position of the bound ligand. The structure and/or functional groups of the ligand 
can then be adjusted, if necessary, in view of the results of the X-ray analysis, and the 

30 synthesis and analysis sequence repeated until an optimised ligand is obtained. Related 
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approaches to structure-based drug design are also discussed in Bohacek et al, Medicinal 
Research Reviews, Vol.16, (1996), 3-50. 

Linked-fragment approaches to drug design also require accurate information on the atomic 
coordinates of target receptors. The basic idea behind these approaches is to determine 
5 (computationally or experimentally) the binding locations of plural ligands to a target 

molecule, and then construct a molecular scaffold to connect the ligands together in such a 
way that their relative binding positions are preserved. The ligands may be provided 
computationally and modelled in a computer system, or provided in an experimental setting, 
wherein crystals according to the invention are provided and a plurality of ligands soaked 
10 separately or in mixed pools into the crystal prior to X-ray analysis and determination of 
their location. 

The binding site of two or more ligands are determined and may be connected to form a 
potential lead compound that can be further refined using e.g. the iterative technique of 
Greer et al. For a virtual linked-fragment approach see Verlinde et al., J. of Computer- 
15 Aided Molecular Design, 6, (1992), 131-147, and for NMR and X-ray approaches see 
Shuker et al., Science, 274, (1996), 1531-1534 and Stout et al., Structure, 6, (1998), 839- 
848. The use of these approaches to design BACE inhibitors is made possible by the 
determination of the BACE structure. v 

Many of the techniques and approaches to structure-based drug design described above rely 
20 at some stage on X-ray analysis to identify the binding position of a ligand in a ligand- 
protein complex. A common way of doing this is to perform X-ray crystallography on the 
complex, produce a difference Fourier electron density map, and associate a particular 
pattern of electron density with the ligand. However, in order to produce the map (as 
explained e.g. by Blundell et al (1976)) it is necessary to know beforehand the protein 3D 
25 structure (or at least the protein structure factors). Therefore, determination of the BACE 
structure also allows difference Fourier electron density maps of BACE-ligand complexes 
to be produced, which can greatly assist the process of rational drug design. 

The provision of the crystal structures of the invention will also allow the development of 
compounds which interact with the binding pocket regions of BACE (for example to act as 
30 inhibitors of a BACE) based on a fragment linking or fragment growing approach. 
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For example, the binding of one or more molecular fragments can be determined in the 
protein binding pocket by X-ray crystallography. Molecular fragments are typically 
compounds with a molecular weight between 100 and 200 Da (Carr et al, 2002). This can 
then provide a starting point for medicinal chemistry to optimize the interactions using a 

5 structure-based approach. The fragments can be combined onto a template or used as the 
starting point for 'growing out' an inhibitor into other pockets of the protein (Blundell et al, 
2002). The fragments can be positioned in the binding pocket of B ACE and then 'grown' to 
fill the space available, exploring the electrostatic, van der Waals or hydrogen-bonding 
interactions that are involved in molecular recognition. The potency of the original weakly 

10 binding fragment thus can be rapidly improved using iterative structure-based chemical 
synthesis. 

At one or more stages in the fragment growing approach, the compound may be synthesized 
and tested in a biological system for its activity. This can be used to guide the further 
growing out of the fragment. 

15 Where two fragment-binding regions are identified, a linked fragment approach may be 
based upon attempting to link the two fragments directly, or growing one or both fragments 
in the manner described above in order to obtain a larger, linked structure, which may have 
the desired properties. 

The previous aspects of the invention relate also to fragment linking or fragment growing 
20 approaches to rational drug design. Thus the step of providing the structure of a candidate 
modulator molecule in the previous aspects may be performed by providing the structures of 
a plurality of molecular fragments and linking the molecular fragments to form a candidate 
modulator molecule. Furthermore the step of fitting the structure of the candidate 
modulator molecule in the previous aspects may be performed by fitting the structure of 
25 each of the molecular fragments (before or after the molecular fragments are linked 
together). 

For example, the computer-based method of rational drug design may comprise: 

(a) providing the coordinates of at least two atoms of the BACE of Table I; (b) providing 
the structures of a plurality of molecular fragments; (c) fitting the structure of each of the 
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molecular fragments to the selected coordinates of the BACE; and (d) assembling the 
molecular fragments into a single molecule to form a candidate modulator molecule. 

In practice, it will be desirable to model a sufficient number of atoms of the BACE as 
defined by the coordinates of Table 1, which represent a binding pocket. Thus, in this 
5 embodiment of the invention, there will preferably be provided the coordinates of at least 5, 
preferably at least 10, more preferably at least 50 and even more preferably at least 100 
preferably at least 500 selected atoms of the BACE structure. 

A further aspect of the invention provides a compound having a chemical structure selected 
using the method of any one of the previous aspects, said compound being an inhibitor of 
10 BACE. 

J. Uses of the Coordinates of the Invention in in silico analysis and design 

Although the invention will facilitate the determination of actual crystal structures 
comprising BACE and a compound, which modulates BACE, current computational 
techniques provide a powerful alternative to the need to generate such crystals and generate 
15 and analyze diffraction data. Accordingly, a particularly preferred aspect of the invention 
relates to in silico methods directed to the analysis and development of compounds, which 
interact, with BACE structures of the present invention. 

The approaches to structure-based drug design described below all require initial 
identification of possible compounds for interaction with target bio-molecule (in this case 

20 BACE). Sometimes these compounds are known e.g. from the research literature. 

However, when they are not, or when novel compounds are wanted, a first stage of the drug 
design program may involve computer-based in silico screening of compound databases 
(such as the Cambridge Structural Database) with the aim of identifying compounds which 
interact with the binding site or sites of the target bio-molecule. Screening selection criteria 

25 may be based on pharmacokinetic properties such as metabolic stability and toxicity. 

However, determination of the BACE structure allows the architecture and chemical nature 
of each BACE binding site to be identified, which in turn allows the geometric and 
functional constraints of a descriptor for the potential inhibitor to be derived. The descriptor 
is, therefore, a type of virtual 3-D pharmacophore, which can also be used as selection 

30 criteria or filter for database screening. 
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Thus as a result of the determination of the BAGE three-dimensional structure, more purely 
computational techniques for rational drug design may also be used to design BACE 
inhibitors (for an overview of these techniques see e.g. Walters et al (Drug Discovery 
Today, Vol.3, No.4, (1998), 160-178; Abagyan, R.; Totrov, M. Curr. Opin. Chem. Biol 
5 2001, 5, 375-382). For example, automated ligand-receptor docking programs (discussed 
e.g. by Jones et al. in Current Opinion in Biotechnology, Vol.6, (1995), 652-656 and 
Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. Proteins 2002, 47, 409-443), which require 
accurate information on the atomic coordinates of target receptors may be used to design 
potential BACE inhibitors. 

10 The aspects of the invention described herein which utilize the BACE structure in silico 
may be equally applied to both the BACE structure of Table 1 and the models of target 
BACE proteins obtained by other aspects of the invention. Thus having determined a 
conformation of a BACE by the method described above, such a conformation may be used 
in a computer-based method of rational drug design as described herein. In addition the 

15 availability of the structure of the BACE will allow the generation of highly predictive 
pharmacophore models for virtual library screening or compound design. 

Accordingly, the invention provides a computer-based method for the analysis of the 
interaction of a molecular structure with a BACE structure of the invention, which 
comprises: (a) providing the structure of a BACE of the invention of Table 1; (b) providing . 
20 a molecular structure to be fitted to said BACE structure; and (c) fitting the molecular 
structure to the B ACE structure of Table 1 . 

In an alternative aspect, the method of the invention may utilize the coordinates of atoms of 
interest of BACE, which are in the vicinity of a putative molecular structure binding region, 
for example within 10-25 A of the catalytic regions or within 5-10 A of a compound bound, 

25 in order to model the pocket in which the structure binds. These coordinates may be used to 
define a space, which is then analyzed "in silico". Thus the invention provides a computer- 
based method for the analysis of molecular structures which comprises: (a) providing the 
coordinates of at least two atoms of a BACE structure of the invention ("selected 
coordinates"); (b) providing the structure of a molecular structure to be fitted to said 

30 coordinates; and (c) fitting the structure to the selected coordinates of the BACE. 
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In practice, it will be desirable to model a sufficient number of atoms of the BACE as 
defined by the coordinates of Table 1, which represent a binding pocket. Thus, in this 
embodiment of the invention, there will preferably be provided the coordinates of at least 5, 
preferably at least 10, more preferably at least 50 and even more preferably at least 100 and 
5 preferably 500 selected atoms of the BACE structure. 

In order to provide a three-dimensional structure of compounds to be fitted to a BACE 
structure of the invention, the compound structure may be modelled in three dimensions 
using commercially available software for this purpose or, if its crystal structure is 
available, the coordinates of the structure may be used to provide a representation of the 
10 compound for fitting to a BACE structure of the invention. 

The step of providing the structure of a candidate modulator molecule may involve selecting 
the compound by computationally screening a database of compounds for interaction with 
the binding cavity or cavities. For example, a 3-D descriptor for the potential modulator 
may be derived, the descriptor including geometric and functional constraints derived from 
15 the architecture and chemical nature of the binding cavity or cavities. The descriptor may 
then be used to interrogate the compound database, a potential modulator being a compound 
that has a good match to the features of the descriptor. In effect, the descriptor is a type of 
virtual pharmacophore. 

In any event, the determination of the three-dimensional structure of BACE provides a basis 
20 for the design of new and specific ligands for BACE. For example, knowing the three- 
dimensional structure of BACE, computer modelling programs may be used to design 
different molecules expected to interact with possible or confirmed binding cavities or other 
structural or functional features of BACE. Examples of this are discussed in Schneider, G.; 
Bohm, H. J. Drug Discov. Today 2002, 7, 64-70. 

25 More specifically, the interaction of a compound with BACE can be examined through the 
use of computer modelling using a docking program such as GOLD (Jones et al., J. MoL 
Biol, 245, 43-53 (1995), Jones et al., J. MoL Biol, 267 \ 727-748 (1997)), GRAMM 
(Vakser, I.A., Proteins , Suppl., 1:226-230 (1997)), DOCK (Kuntz et al, JMolBiol 1982 , 
161, 269-288, Makino et al, J.ComputChem. 1997, 18, 1812-1825), AUTODOCK 

30 (Goodsell et al, Proteins 1990, 8, 195-202, Morris et al, J.ComputChem. 1998, 19, 1639- 
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1662.), FlexX, (Rarey et al, JMolBiol 1996, 261, 470-489) or ICM (Abagyan et al, 
J.Comput.Chem. 1994, 75, 488-506). This procedure can include computer fitting of 
compounds to BACE to ascertain how well the shape and the chemical structure of the 
compound will bind to the BACE. 

5 Also computer-assisted, manual examination of the binding site structure of BACE may be 
performed. The use of programs such as GRID (Goodford, J. Med, Chem., 28, (1985), 849- 
857) - a program that determines probable interaction sites between molecules with various 
functional groups and an enzyme surface - may also be used to analyse the binding cavity or 
cavities to predict partial structures of inhibiting compounds. 

10 Computer programs can be employed to estimate the attraction, repulsion, and steric 
hindrance of the two binding partners (i.e. the BACE and a candidiate modulator). 
Generally the tighter the fit, the fewer the steric hindrances, and the greater the attractive 
forces, the more potent the potential modulator since these properties are consistent with a 
tighter binding constant. Furthermore, the more specificity in the design of a potential drug, 

15 the more; likely it is that the drug will not interact with other proteins as well. This will tend 
to minimise potential side-effects due to unwanted interactions with other proteins. 

In another aspect, the present invention provides a method for identifying an agent 
compound (e.g. an'inhibitor) which modulates BACE activity, comprising, the steps of: (a) 
employing three-dimensional atomic coordinate data according to Table 1 to characterise at 
20 least one BACE binding site and preferably a plurality of BACE binding sites; (b) providing 
the structure of a candidate agent compound; (c) fitting the candidate agent compound to the 
binding sites; and (d) selecting the candidate agent compound. 

Preferably sufficient binding sites are characterised to define a BACE binding cavity or 
cavities. 

25 A plurality (for example two, three or four) of (typically spaced) BACE binding sites may 
be characterised and a plurality of respective compounds designed or selected. The agent 
compound may then be formed by linking the respective compounds into a larger compound 
which preferably maintains the relative positions and orientations of the respective 
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compounds at the binding sites. The larger compound may be formed as a real molecule or 
by computer modelling. 

In one embodiment a plurality of candidate agent compounds are screened or interrogated 
for interaction with the binding sites. In one example, step (b) involves providing the 

5 structures of the candidate agent compounds, each of which is then fitted in step (c) to 
computationally screen a database of compounds (such as the Cambridge Structural 
Database) for interaction with the binding sites, i.e. the candidate agent compound may be 
selected by computationally screening a database of compounds for interaction with the 
binding sites (see Martin, J. Med Chem., vol35, 2145-2154 (1992)). In another example, a 

10 3-D descriptor for the agent compound is derived, the descriptor including e.g. geometric 
and functional constraints derived from the architecture and chemical nature of the binding 
cavity or cavities. The descriptor may then be used to interrogate the compound database, 
the identified agent compound being the compound which matches with the features of the 
descriptor. In effect, the descriptor is a type of virtual pharmacophore. 

15 In a related aspect, the present invention provides a method for identifying a candidate 

modulator (e.g. potential inhibitor) of B ACE comprising the steps of: (a) employing a three- 
dimensional structure of BACE, at least one sub-domain thereof, or a plurality of atoms 
thereof, to characterise at least one BACE binding cavity, the three-dimensional structure 
being defined by atomic coordinate data according to Table 1 ; and (b) identifying the 

20 candidate modulator by designing or selecting a compound for interaction with the binding 
cavity. 

Detailed structural information can then be obtained about the binding of the compound to 
BACE, and in the light of this information adjustments can be made to the structure or 
functionality of the compound, e.g. to improve its interaction with BACE. The above steps 
25 may be repeated and re-repeated as necessary. 

K. Compound selection 

In another aspect, in place of in silico methods, high throughput screening of compounds to 
select compounds with binding activity may be undertaken, and those compounds which 
show binding activity may be selected as possible candidate modulators, and further 
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crystallized with BACE (e.g. by eo-crystallization or by soaking) for X-ray analysis. The 
resulting X-ray structure may be compared with that of Table 1 for a variety of purposes. 

L. Compounds of the invention 

Having designed or selected possible binding candidate modulators (e.g. by in silico 
5 analysis, "wet" chemical methods, X-ray analysis etc.) by determining those which have 
favourable fitting properties (e.g. strong attraction between candidate and BACE), these can 
then be screened for activity. 

Consequently all the methods of compound design and identification outlined above can 
optionally include the step of: (a) obtaining or synthesising the candidate modulator; and 
10 (b) contacting the candidate modulator with BACE to determine the ability of the candidate 
modulator to interact with BACE. 

More preferably, in the latter step the candidate modulator is contacted with BACE under 
conditions to determine its function. 

For example, in the contacting step above the candidate modulator is contacted with BACE 
15 in the presence of a substrate, and typically a buffer, to determine the ability of said 

candidate modulator to inhibit BACE. The substrate may be e.g. APP. So, for example, an 
assay mixture for BACE may be produced which comprises the candidate modulator, 
substrate and buffer. 

Detailed structural information can be obtained about the binding of the candidate 
20 modulator to BACE, and in the light of this information adjustments can be made to the 
structure or functionality of the candidate modulator, e.g. to improve binding to the binding 
cavity or cavities. The above steps may be repeated and re-repeated as necessary. 

Following identification of such compounds, it may be manufactured and/or used in the 
preparation, i.e. manufacture or formulation, of a composition such as a medicament, 
25 pharmaceutical composition or drug. These may be administered to individuals. 

Thus, the present invention extends in various aspects not only to a compound as provided 
by the invention, but also a pharmaceutical composition, medicament, drug or other 
composition comprising such a compound e.g. for treatment (which may include 
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preventative treatment) of disease; a method comprising administration of such a 
composition to a patient, e.g. for treatment of disease; use of such an inhibitor in the 
manufacture of a composition for administration, e.g. for treatment of disease; and a method 
of making a pharmaceutical composition comprising admixing such an inhibitor with a 
5 pharmaceutical^ acceptable excipient, vehicle or carrier, and optionally other ingredients. 

Thus a further aspect of the present invention provides a method for preparing a 
medicament, pharmaceutical composition or drug, the method comprising: 

(a) identifying a B ACE modulator molecule (which may thus be termed a lead compound) 
by a method of any one of the other aspects of the invention disclosed herein; (b) optimising 

10 the structure of the modulator molecule; and (c) preparing a medicament, pharmaceutical 

■ . j 

composition or drug containing the optimised modulator molecule. 

The above-described processes of the invention may be iterated in that the modified 
compound may itself be the basis for further compound design. 

By "optimising the structure" we mean e.g. adding molecular scaffolding, adding or varying 
15 functional groups, or connecting the molecule with other molecules (e.g. using a fragment 
linking approach) such that the chemical structure of the modulator molecule is changed 
while its original modulating functionality is maintained or enhanced. Such optimisation is 
regularly undertaken during drug development programmes to e.g. enhance potency, 
promote pharmacological acceptability, increase chemical stability etc. of lead compounds. 

20 Modification will be those conventional in the art known to the skilled medicinal chemist, 
and will include, for example, substitutions or removal of groups containing residues which 
interact with the amino acid side chain groups of a BACE structure of the invention. For 
example, the replacements may include the addition or removal of groups in order to 
decrease or increase the charge of a group in a test compound, the replacement of a charge 

25 group with a group of the opposite charge, or the replacement of a hydrophobic group with 
a hydrophilic group or vice versa. It will be understood that these are only examples of the 
type of substitutions considered by medicinal chemists in the development of new 
pharmaceutical compounds and other modifications may be made, depending upon the 
nature of the starting compound and its activity. 
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Compositions may be formulated for any suitable route and means of administration. 
Pharmaceutical^ acceptable carriers or diluents include those used in formulations suitable 
for oral, rectal, nasal, topical (including buccal and sublingual), vaginal or parenteral 
(including subcutaneous, intramuscular, intravenous, intradermal, intrathecal and epidural) 
5 administration. The formulations may conveniently be presented in unit dosage form and 
may be prepared by any of the methods well known in the art of pharmacy. 

For solid compositions, conventional non-toxic solid carriers include, for example, 
pharmaceutical grades of mannitol, lactose, cellulose, cellulose derivatives, starch, 
magnesium stearate, sodium saccharin, talcum, glucose, sucrose, magnesium carbonate, and 
the like may be used. Liquid pharmaceutically administrate compositions can, for 
example, be prepared by dissolving, dispersing, etc, an active compound as defined above 
and optional pharmaceutical adjuvants in a carrier, such as, for example, water, saline 
aqueous dextrose, glycerol, ethanol, and the like, to thereby form a solution or suspension. 
If desired, the pharmaceutical composition to be administered may also contain minor 
amounts of non-toxic auxiliary substances such as wetting or emulsifying agents, pH 
buffering agents and the like, for example, sodium acetate, sorbitan monolaurate, 
triethanolamine sodium acetate, sorbitan monolaurate, triethanolamine oleate, etc. Actual 
methods of preparing such dosage forms are known, or will be apparent, to those skilled in 
this art; for example, see Remington's Pharmaceutical Sciences, Mack Publishing 
Company, Easton, Pennsylvania, 15th Edition, 1975. 

Compositions may be used, e.g. for treatment (which may include preventative treatment) of 
a disease such as Alzheimer's disease or Alzheimer' s-type pathology in Downs syndrome. 
Thus the invention provides a method comprising administration of such a composition to a 
25 patient, e.g. for treatment of a disease such as Alzheimer's disease; use of such an agent 
compound in the manufacture of a composition for administration, e.g. for treatment of a 
disease such as Alzheimer's disease; and a method of making a pharmaceutical composition 
comprising admixing such an agent compound with a pharmaceutically acceptable 
excipient, vehicle or carrier, and optionally other ingredients. 



15 
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Exemplification 

The invention will now be described with reference to specific Examples. These are merely 
exemplary and for illustrative purposes only: they are not intended to be limiting in any 
way to the scope of the invention described. These examples constitute the best mode 
5 currently contemplated for practicing the invention. 

BACE protease was expressed at high levels in bacterial cells as insoluble inclusion bodies. 
To prepare functional protein for enzyme assay and structural studies these inclusion bodies 
were solublised using denaturants; the slow removal of these denaturants allowed the 
formation of the correct tertiary structure. In the method described here, BACE was 

10 expressed as a pro-sequence and required activation by a protease before becoming fully 
functional. Clostripain was used as an activating protease but produced multiple species of 
BACE as determined by mass spectrometry. In order to obtain a uniform homogenous 
protein after activation by clostripain, a number of different constructs were produced. , 
These constructs focused on the mutation of two undesireable clostripain cleavage sites 

15 (following residues R56 and R57). ^ 

. Cloning of BACE WT and BACE N->Q 

The full-length DNA coding sequence of BACE was cloned from human cerebellum and 
human dorsal root ganglion (DRG) cDNA by PCR using oligonucleotide primers based on 
the published BACE sequence (EMBL accession no. AF 190725). The full-length template 
20 sequence was obtained by PCR amplification using the following primers: hBACE-spl and 
-ap l were used for primary amplification, hBACE-sp2 and -ap2 for nested PGR. 
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The primers were as follows: 



hBACE-spl 



5'-AGCTCCCTCTCCTGAGAAGCCACC-3' (SEQ ID NO: 22) 



hBACE-apl 



5'-CCACAGGTGCCATCTGTGTCTCC-3' (SEQ ID NO: 23) 



hBACE-sp2 



5'-CACCAGCACCACCCAGACTTGG-3' (SEQ ID NO: 24) 



5 



hBACE-ap2 



5'-AACCACGGAGGTGTGGTCCAGG-3' (SEQ ID NO: 25) 



A cDNA construct encoding a modified BACE form was made as follows. A partial BACE 
cDNA fragment was amplified using the full-length BACE clone as a template with primers 
hBACE_EC(Bam-M-14)_FOR (5' - CGG GAT CCA TGG CGG GAG TGC TGC CTG CC 



1 o TGT GGA ATG TTG TAG C - 3 ') . The resulting 1 342 bp PGR fragment was subcloned in 
vector pCR2. 1 -TOPO using the TOPO TA cloning® kit (Invitrogen) according to the 
manufacturer's instructions. The inserts of several resulting clones were fully sequenced 
and a clone containing no PCR mistakes was selected. The insert of this clone was excised 
from the pCR2. 1 -TOPO construct using the BamHl restriction endonuclease and subcloned 

15 to vector pETl la (Novagen) linearized with BamHI. The BACE coding sequence (BACE 
WT, SEQ ID 1) in the resulting clones was confirmed by sequence analysis and the 
resulting correct construct was named M-T7-RGSM(BACE14-453)/pETl la. 

Plasmid M-T7-RGSM(BACE14-453)/pETl la encodes a 455 amino acid residue protein 
named BACE WT containing a T7 epitope tag encoded by the pETl la vector sequence (A A 
20 1 to 1 1), a linker sequence (AA 12-15; RGSM) and the partial BACE amino acid sequence 
from residue 14 to 453 (AA 16 to 455)(numbering based on SEQ ID 2). The calculated 
molecular mass of the resulting protein is 50.2 kDa. 

The insert from construct Plasmid M-T7-RGSM(BACE14-453)/pETl la was amplified by 
PCR to incorporate a His 6 tag (CAT CAC CAT CAT CAC C AC) just upstream of the stop 
25 codon and BamRl site. Following cloning of this amplified fragment back into the original 
expression vector, the asparagine residues at positions -153, -172, -223 and -354 (numbers 
refer to the database BACE sequence BACE HUMAN, P56817 in Swissprot) were mutated 



- 3') and hBACE_EC(Bam-453)_REV (5' - CGG GAT CCT TAT GAC TCA TCT GTC 
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to glutamine (AAC to CAA) using the Quikchange™ mutagenesis system (Stratagene, used 
according to the manufacturers instructions), to generate BACE N->Q (SEQ ID 3). 

Introduction of Activation Site Mutations 

BACE WT and BACE N->Q, described above, were mutated using the Quickchange™ site 
5 directed mutagenesis protocol (Stratagene). Two complimentary oligonucleotides were 
designed which spanned the site of the mutation and which incorporated the amino acids 
changes to be made. These oligonucleotides were then used as primers in a PCR reaction 
producing each of the strands of the plasmid with the mutation present; the parental plasmid 
is digested with the methylation sensitive restriction endonuclease Dpnl and then 
10 transformed into competent E.coli cells. 

Primers were applicable for the mutation of both BACE WT and BACE N->Q due to their 
high sequence homology. Seven constructs were produced; these are detailed below with 
the oligonucleotide sequence used to make the constructs. 

1) BACE WT mutating arginine 56 to lysine and arginine 57 to lysine (SEQ ID 5) 

15 5' - CCCGAGGAGCCCGGCAAGAAGGGCAGCTTTGTGGAGATG - 3 ' (SEQ ID NO: 
26) 

5' - CATCTCCACAAAGCTGCCCTTCTTGCCGGGCTCCTCGGG - 3' (SEQ ID NO: 
27) 

2) BACE WT mutating arginine 57 to lysine (SEQ ID 7) 

20 5' - CCCGAGGAGCCCGGCCGGAAGGGCAGCTTTGTGGAGATGG - 3' (SEQ ID 
NO: 28) 

5 ' - CCATCTCCACAAAGCTGCCCTTCCGGCCGGGCTCCTCGGG - 3 ' (SEQ ID NO: 
29) 

3) BACE WT deleting arginine 57 (SEQ ID 9) 

25 5 9 - CCCG AGGAGCCCGGC AGGGGCAGCTTTGTGG AGATGGTGGAC - 3 ' (SEQ ID 
NO: 30) 
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5' - GTCCACCATCTCCACAAAGCTGCCCCTGCCGGGCTCCTCGGG - 3' (SEQ ID 
NO: 31) 

4) BACE N->Q mutating arginine 56 to lysine and arginine 57 to lysine (SEQ ID 1 1) 

5' - CCCGAGGAGCCCGGCAAGAAGGGCAGCTTTGTGGAGATG - 3' (SEQ ID NO: 
5 32) 

5' - CATCTCCACAAAGCTGCCCTTCTTGCCGGGCTCCTCGGG - 3' (SEQ ID NO: 
33) 

5) BACE N->Q mutating arginine 57 to lysine (SEQ ID 15) 

5' - CCCGAGGAGCCCGGCCGGAAGGGCAGCTTTGTGGAGATGG - 3' (SEQ ID 
10 NO: 34) 

5' - CCATCTCCACAAAGCTGCCCTTCCGGCCGGGCTCCTCGGG - 3' (SEQ ID 
NO:35) 

6) BACE N->Q deleting arginine 57 (SEQ ID 17) 

5' - CCCGAGGAGCCCGGCAGGGGCAGCTTTGTGGAGATGGTGGAC - 3' (SEQ ID 
15 NO: 36) 

5' - GTCCACCATCTCCACAAAGCTGCCCCTGCCGGGCTCCTCOGG - 3' (SEQ ID 
NO: 37) 

7) BACE N->Q mutating arginine 56 to lysine and arginine 57 to lysine and removing the C 
terminal poly histidine tag (SEQ ID 13) 

20 5' - CCCGAGGAGCCCGGCAAGAAGGGCAGCTTTGTGGAGATG 3' (SEQ ID NO: 
38) 

5' - CATCTCCACAAAGCTGCCCTTCTTGCCGGGCTCCTCGGG - 3' (SEQ ID NO: 
39) 

5' - CCACAGACAGATGAGTCATGACACCATCATCACCACTAAG - 3' (SEQ ID NO: 
25 40) 
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5 ' - CTTAGTGGTGATGATGGTGTCATGACTCATCTGTCTGTGG - 3 ' (SEQ ID NO: 
41) 

After transformation of the plasmid the protein coding region was checked by DNA 
sequencing. 

5 Protein production (1) 

Plasmid constructs were transformed into BLR(DE3) as follows: 1-2 /xl DNA was added 
into 25ul BLR(DE3) competent cells. Cells were then heat shocked at 42°C for 45secs, 
followed by incubation for 30mins at 4°C . The sample was placed on ice for 2-3 mins 
before addition of 125-250ul HOC medium and left for 60 mins at 37°C. Cells were plated 
10 out onto agar containing carbenicillin & incubated at 37°C for 16h. Transformations were 
stored at 4°C. Transformed cells could be used up to after 8 weeks storage. 

Colonies were inoculated in 100 ml LB broth with ImM carbenicillin, and shaken for 16h at 
25°C. 12 ml of this culture was added to 1 L of the same medium in baffle flasks. The 
typical total culture volume was 12, 20 or 24 L. Cells were induced by addition of ImM 
15 IPTG at approximately OP 6 oo 1 .0. Cells were harvested 3 to 4 hours after induction by 
centrifugation for 7 min at 16 000 g. Cell pellets were resuspended in 1 litre TN buffer 
(150mM NaCl, 50mM Tris, pH 7.5) before addition of 10 mg lysozyme per litre of 
bacterial culture. The suspension was left for 20 mins under vigorous stirring then frozen at 
-70°C. 

20 The lysates were thawed & adjusted to 1 mM MgCl2 and 20 /xl 10 mg/ml DNAse, incubated 
30-60 mins at 20°C, then 0.1 % Triton X-100 was added. Inclusion body washes were 
performed in 11 wash steps, spun down at 13,000-16,000 g for 20mins at room temperature 
then resuspended by sonication in TNT buffer (TN buffer + 0.1% Triton 100). The washing 
step with TNT was repeated at least three times (up to seven times) until an almost 

25 homogenous dark cream precipitate was obtained. At this stage the pellet was washed twice 
with TN buffer. The typical yield for a 12 L culture of BACE WT constructs was 4.5 g 
washed inclusion body material. 
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Protein Refolding (1) 

Each g of inclusion bodies was solubilised with 22.5 ml of 8 M urea, 50 mM Tris, 0.1 M 
beta-mercaptoethanol, 10 mM DTT, 1 mM EDTA. After 2 to 3 hours under gentle stirring, 
this was spun at 48 400 g for 25mins. This was then diluted 1 in 10 in 8 M Urea, 0.2 mM 
5 oxidized glutathione, 1 .0 mM reduced glutathione. This is the starting solution for refolding 

Refolding was accomplished by dilution into 20 volumes 20 mM Tris, 10 mM NDSB256 
(3-(benzyldimethylammonio)propanesulfonate). The addition was achieved by slowly 
dripping from a burette into a strongly stirred solution. Addition was carried out at room 
temperature. 

10 The pH was adjusted to approximately 9 using 13 .5 ml 1 N HC1 per 5 litre of refolding mix 
either immediately after dilution or 16 h after dilution. This was left at 4°C for 2-3 weeks. 
The refolding mix was then adjusted to pH 8.2 16h before concentrating. In instances where 
a longer incubation was applied it appeared that yields were slightly better. No precipitation 
was seen when attempting to refold BACE, even in totally unsuccessful conditions. 

15 Constructs BACE WT R57K, BACE WT R57DEL, BACE N->Q R57K, and BACE 
R57DEL refolded with lower yields. 

Protein Purification of BACE from refolding step (1) 

The refolded protein sample was concentrated by ultrafiltration using two parallel Vivaflow 
200 cells (MWCO 30Kda), fed by a single pump. The concentration factor was not more 
20 than 200 times: if exceeded, precipitation occurred. 

Concentrated refolded BACE was loaded and eluted on a 1 .75 L Sephacryl 300 column run 
at a flow of 0.2 cm-l/min in 0.4 M Urea, 20 mM Tris, 10 mM HC1. Typical loading volume 
was 2% bed volume. From reconcentrated material three peaks are observed, the first one 
near the void volume (large aggregates), which merges into a second peak of aggregated 
25 inactive material. The third peak (elutes at approx 40% of column volume) constitutes 

active BACE. For BACE WT constructs, the active fraction elutes at approximately 800ml. 

Activation by Clostripain (1) 

Clostripain (Cp; EC 3.4.22.8, from Worthington or Sigma C7403) was activated before use 
by solubilising the freeze dried material to 1 .25 mg/ml in: 20 mM Calcium Acetate, 8 mM 
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DTT, 100 mM Tris, pH 8 at 1.25 mg/ml 4 °C for at least lh. The preparation was then 
stable at 4 °C for up to four weeks. 

The third peak (typically 100 ml at an average of 0.3 mg ml) from Sephacryl 300 elution 
was treated with activated Cp, (1/100 dilution) for between 30-90mins at room temperature. 

5 Activation of BACE WT R56KR57K, BACE N->Q R56KR57K & BACE N->Q 

R56KR57K no His by clostripain was performed as described above except that prior to 
activation the solution was concentrated ten fold using Vivaspin 20 ml 30 KDa MWCO. 

The reaction was stopped by loading onto a Mono Q HR5-5 column equilibrated in 0.4 M 
Urea, 20 mM Tris, 10 mM HC1, 1 mM EDTA followed by washing using the same buffer. 
10 The protein was eluted with a 0 to 1 M NaCl gradient over 10 column volumes. A typical 
final yield of active soluble BACE WT R56KR57K is 1-2 mg of protein per litre of culture 
grown. The eluted protein was characterised and used in crystallisation assays. 

Protein Production (2) 

BLR (DE3) competent cells were transformed as described earlier and plated onto agar 
15 containing ampicillin (Amp). A colony was picked into 250ml LB 4- lOOug/ml Amp and 
grown overnight @ 37°C, 185rpm. Following overnight growth (OD 6 oo varied between 2.0- 
2.5) 10ml of this culture was used to inoculate 1L of fresh LB+100 //g/ml Amp in a 2L 
baffled flask. Routinely 24L of fresh LB+Amp would be inoculated from the overnight 
growth. Following inoculation, the 24L prep would be grown at 37°C, 185rpm until an 
20 OD600 = 1 0 was obtained. Protein expression was induced by the addition of IPTG to a final 
concentration of ImM. Cultures were incubated for a further 3 hours (at 37°C, 185rpm) 
before harvesting by centrifugation at 8000 rpm for 10 mins (JLA 8.1000). Cell pellets 

could be stored at -80°C or processed immediately. 

) 

All following protein production procedures were performed at room temperature unless 
25 stated otherwise. Cell pellet was re-suspended in 500ml of TN buffer (TN buffer - 1 50mM 
NaCl, 50mM Tris, pH7.5). 240mg of egg lysozyme (lOmg/L of bacterial culture) was added 
to the re-suspended pellet. The suspension was left stirring for 20mins. Following this, 
lOOul of DNase 1 (lOmg/ml stock) was added to the suspension and this was left stirring for 
20mins. This lysate was clarified by centrifugation at 8000rpm for 20mins (JLA8.1000). 
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The supernatant was discarded and the pellet was re-suspended in 100ml TNT buffer (TNT 
buffer - 150mM NaCl, 50mM Tris, pH7.5, 0.1% Triton X-100). Effort was made to break 
up any lumps present in the pellet so that a homogenous re-suspension was obtained. 
Following this, the re-suspension was sonicated for 2 mins (20 sec pulses). 400ml of TNT 

5 buffer was added to bring the volume of the suspension up to ~500mls. This was 

centrifuged for 20mins at 8000rpm and the supernatant discarded. The re-suspension in 
TNT buffer and sonication steps, as described above, were repeated twice. Following these 
three TNT washes, the pellet was re-suspended in 100ml of TN buffer and sonicated for 2 
mins (20 second pulses). The suspension was centrifuged for 20 mins at 8000rpm. This 

10 ' wash in TN buffer was repeated once. Approximately 12-1 5g of inclusion bodies was 
obtained from the 24L of culture. 

Protein Refolding (2) 

The inclusion body preparation was solubilised by addition of lOOmls of solubilisation 
buffer (Sol. Buffer - 8M urea, 50mM Tris, 0.1M beta-mercaptoethanol, lOmM DTT, ImM 
15 EDTA). Effort was made to break up the inclusion body pellet using a pipette/spatula. The 
solution was left stirring gently overnight. The suspension was centrifuged for 30 mins at 
25,000rpm (JA25). The supernatant (~100mls) was diluted by the addition of 900mls of 8M 
urea, 0.2mM oxidised glutathione, 1 .OmM reduced glutathione. 

The 1L of solubilised inclusion bodies as prepared above were refolded by a further 20x 
20 dilution. A 250ml aliquot of solubilised inclusion body prep was added drop-wise to 4.75L 
of refolding buffer (Refolding buffer - 20mM Tris, lOmM NDSB256 (3- 
(benzyldimethylammoriio)propanesulfonate) r The 4.75L of refolding buffer was stirred 
vigorously (not foaming) and the 250mls of inclusion body prep was added using a 
peristaltic pump. Care was taken to add the 250mls at a fast drop rather than a continuous 
25 pour. The remaining 750mls of inclusion body prep was diluted in the same way (250mls 
into 4.75L of refolding buffer). The four 5L vessels were placed at 4°C overnight. 

Following overnight incubation at 4°C, the pH of each 5L vessel was adjusted to pH9.0 by 
addition of cone HC1. The vessels were then placed back at 4°C and left for 3 weeks. 
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Protein Purification of B ACE from Refolding Step (2) 

Two parallel Vivaflow 200 cells (MWCO 30Kda) fed by a single peristaltic pump were 
used. Each 5L of refolding mix was concentrated to ~50mls. Over concentrating leads to 
precipitation and should be avoided. The concentration of 5L of refolding mix took -2 

5 hours. The 50mls of concentrated refolding mix was centrifiiged for 25 mins, at 25,000rpm. 
The supernatant was then ready for gel filtration using a Sephacryl S-300 column (100x3.5 
). This method is limited by the volume of concentrated refolding mix than can be loaded 
onto the gel filtration column (50mls) per run. Sephacryl S-300 column was equilibrated 
with 0.4M urea, 20mM Tris, lOmM HC1 (at a flow rate of 4ml/min). 50ml of sample can be 

10 loaded per run. The column was run at a flow rate of 4ml/min. SDS PAGE analysis of peaks 
1,2 and 3 showed the presence of B ACE (50Kda band) however activity assay of all three 
peaks showed only active BACE in peak 3. Fractions from Peak 3 were pooled and kept on 
ice. 

Activation by Clostripain (2) 

15 Clostripain (Sigma C7403) was prepared by dissolving protein to a final concentration of 
1 .25mg/ml in 20mM Calcium acetate, 8mM DTT, lOOmM Tris pH 8.0. The clostripain was 
activated by incubating on ice for 1 hour prior to use. 

Pooled fractions from peak 3 (-100ml at 0.2mg/ml) were activated by the addition of 1/100 
dilution of 1 .25mg/ml clostripain. The reaction was incubated at 37°C in a water bath for 90 
20 minutes. The reaction was stopped by addition of ImM EDTA and placed on ice. Note: 
With each fresh batch of Sigma Clostripain, a time trial was performed on a small amount 
of BACE to verify the length of incubation needed at 37°C. The length of incubation varied 
from 30-90 mins. Analysis by SDS PAGE clearly showed the appearance of the lower 
molecular weight activated species (~47Kda) from the larger inactivated species (~50Kda). 

25 A Mono Q 5/5 ion exchange column was pre-equilibrated in 0.4M urea, 20mM Tris, lOmM 
HC1. The activated BACE (~50mls at ~0.2mg/ml) was loaded onto the Mono Q column at a 
flow rate of 1 .Oml/min. Activated BACE was purified by applying a linear salt gradient 
(0.4M urea, 20mM Tris, lOmM HC1, 1.0M NaCl) over 20 column volumes. Following 
analysis by SDS PAGE and subsequent activity assay, fractions corresponding to activated 
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BACE were pooled and buffer exchanged into crystallisation buffer (20mM Tris, pH8.2, 
150mMNaCl, ImM DTT). 

Protein Purification of BACE from Refolding Step (3) 

By using method 3 in conjunction with the S-200 INDEX gel filtration column, all 20L of 
5 refolding mix could be processed in one go. 

A Sartocon filtration cassette (MWCO 30Kda) was used in conjunction with a Watson 
Marlow 623 S high speed pump. This assembly was set up as described in the manufactures 
operation manual. The 20L of refolding mix was concentrated down to ~500mls in less than 
1 hour. Due to the dead volume in the assembly tubing, the volume could not be reduced 
10 further. At this stage the 500mls of concentrated refolding mix was filtered using a 0.2um 
filter. The filtered sample was then ready for gel filtration using an S-200 INDEX gel 
filtration column (100x10.0). A S.200 INDEX column pre-equilibrated in 0.4M urea, 
20mm Tris, lOmM HC1 was used. The cojumn run was at a flow rate of lOmls/min. 

SDS analysis of peaks 1,2 and 3 showed that BACE was present in all fractions. Activity 
15 assay showed that only peak 3 contain some BACE activity/Fractions from peak 3 were 
pooled (~250mls at O.lmg/ml). 

Prior to clostripain activation, the BACE sample was concentrated using a Resource Q ion 
exchange column. A 6/1 Resource Q column was pre-equilibrated in 0.4M urea, 20mM 
Tris, lOmM HC1. The Bace sample was loaded onto the column at 7ml/min. BACE was 
20 eluted off the column using a linear salt gradient (0.4M urea, 20mM Tris, lOmM HC1, 1M 
NaCl) over 5 column volumes. This step has the effect of dramatically reducing the sample 
volume size. Prior to clostripain activation, the protein sample is diluted with 0.4M urea, 
20mM Tris, lOmM HC1 to reduce the salt concentration to enable further purification using 
Mono Q. A dilution factor of 5: 1 has been used successfully. 

25 This is then followed by Clostripain Activation and Mono Q purification as outlined above. 
Protein Characterization 

The quality of the final preparation was evaluated by: 
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(a) SDS polyacrylamide gel electrophoresis , performed using commercial gels (Novagen) 
followed by Coomassie Brilliant Blue staining according to the manufacturer's instructions. 
The purity as estimated by scanning a digital image of a gel was estimated to be at least 
95%. 

5 (b) Mass Spectroscopy . The eluted peak(s) were analysed using ESI-TOF-MS. Mass 
spectroscopy was performed using a Bruker "BioTOF" electrospray time of flight 
instrument. Samples were either diluted by a factor of 1000 straight from storage buffer into 
methanol/water/formic acid (50:48:2 v/v/v), or subjected to reverse phase HPLC separation 
using a C4 column. Calibration was achieved using Bombesin and angiotensin I using the 
10 24- and 1+ charged states. Data were acquired between 200 and 2000/w/z range and were 
subsequently processed using Bruker' s X-mass program. Mass accuracy was typically 
below 1 in 10 000. 

MS Analysis of BACE WT R56KR57K (SEQ ID NO: 6) 
Full-length protein: MASMTGGQQMGRGSMAGVLPAHGT. . . 

15 Predicted mass of full-length protein: 50147 

Cleavage position: 

MASMTGGQQMGR I GSMAGVLPAHGT. . . 

■> 

Predicted mass of BACE protein: 489 11. This is the first intermediate fragment and is 
obtained very quickly and can be obtained as a stable fragment at lower enzyme 
20 concentration. 

Cleavage position: 

MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLR i 
LPRETDEEP... 

Predicted mass of BACE protein: 45781 . This is the final fragment obtained in the 
25 conditions described above. Observed ES-MS spectra of this fragment deconvolutes to a 
parent mass of 45783. The fragment typically elutes as a single peak from the Mono Q 5.5. 
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Mass Spec Analysis of B ACE N->Q R56KR57K (SEQ ID NO: 12) 
Predicted mass of full-length protein: 50895 

Cleavage position: 

masmtggqqmgrgsmagvlpahgtqhgirlplrsglggaplglrI 
5 lpretdeep... 

r 

Predicted mass of BACE protein: 46660.65. This is the final fragment obtained in the 
conditions described above. Observed ES-MS spectra of this fragment deconvolutes to a 
parent mass of 46655. The fragment typically elutes as two peaks from the Mono Q 5.5, the 
first corresponding to the desired fragment. 

10 Mass Spec Analysis of BACE N->0 R56KR57K no His (SEQ ID NO: 14) 
Predicted mass of full-length protein: , 50072.73 

Cleavage position: 

MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLR I 
LPRETDEEP... 

15 Predicted mass of BACE protein: 45837.80. This is the first intermediate fragment, 

obtained rapidly between 30-60 minutes post activation and is suitable for crystallisation. 
Observed ES-MS spectra of this fragment deconvolutes to a parent mass of 45838.30. 
Typically elutes as 2 peaks from the Mono Q 5.5, the first peak corresponding to the desired 
fragment. 

20 Cleavage position: 

MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEE 
PGK4KGSFVEMV... 

Predicted fragment mass: 44230.1 1. Further digestion beyond 60 minutes promotes the 
formation of the above fragment, not suitable for crystallisation. Observed ES-MS spectra 
25 of this fragment deconvolutes to a parent mass of 44228.03. This typically elutes as peak 2 
from the Mono Q 5.5. 
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Method for Determining Activity of BACE 

A fluorimetric assay was used to measure the activity of the refolded proteins. Activity of 
the BACE enzyme was measured using the fluorescent peptide R-E(EDANS)-E-V-N-L-*D- 
A-E-F-K(DABCYL)-R-OH (Bachem) as substrate. Assays were carried out in 96-well 
5 black, flat-bottomed Cliniplates in a final assay volume of lOOul. The reaction rate was 
monitored at room temperature on a Fluoroskan Ascent plate reader with excitation and 
emission wavelengths of 355nm and 530nm respectively. 

To determine the pH profile for the enzyme 8 nM BACE was incubated with 10 fiM 
substrate in 50 mM sodium acetate (pH 3.5-5.5) or MES (pH 5.5-6.5) buffers at varying pHs 
10 and 5% DMSO. 

For kinetic characterization of the enzyme 8 nM BACE enzyme was incubated with varying 
concentrations of the substrate (2.5 - 80 fiM) in 50 mM sodium acetate, pH 5, 5 % DMSO 
and the reaction monitored as described above. Kinetic parameters were determined by the 
standard Michaelis-Menten equation, using Prizm (GraphPad) software. ImM OM 99 
15 completely inhibits activity. 

Protein Crystallisation 

The sample of BACE was buffer exchanged into 20 mM Tris.HCl pH8.2, 150 mM NaCl, 1 
mM DTT and concentrated down to approximately 7 mg/ml as determined by its theoretical 
extinction coefficient. Prior to crystallisation, the sample was spun at 55,000 rpm for 30 
20 min using a Beckman benchtop ultracentrifuge. DMSO was added to a final concentration 
of3%(v/v). 

Crystals of BACE from BACE WT R56KR57K, BACE N->Q R56KR57K & BACE N->Q 
R56KR57K no His were obtained by the hanging-vapour diffusion method at 20 °C using 
1.5 juJ of protein and an equivalent volume of reservoir solution. The reservoir solution 
25 contained 20-24 % PEG 5000 MME, 1 80-220 mM (e.g. 200 mM) ammonium iodide, 1 80- 
220 mM (e.g. 200 mM) tri-sodium citrate (pH 6.4-6.6). In an alternative, the reservoir 
solution may additionally contain 2.5% v/v glycerol. 

Diffraction quality single crystals of BACE WT R56KR57K were obtained by the hanging- 
vapour diffusion method at 20 °C using 1 .5 jol of protein and an equivalent volume of 
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reservoir solution. The reservoir solution contained 20-22.5 % PEG 5000 MME, 1 80-220 
mM (e.g. 200 mM) ammonium iodide, 180-220 mM (e.g. 200 mM) tri-sodium citrate (pH 
6.4-6.6). 

Crystals appear within the first week and grow to maximum dimensions within 14 days. 
5 The crystals were hexagonal rods with approximate dimensions of 0.2 x 0.05 x 0.05 mm. 
They belonged to the hexagonal space group P6 t 22 with cell parameters a = b = 103.2 A, c 
= 169.1 A and accommodate one enzyme molecule per asymmetric unit, and a solvent 
content of 66 %. 

Inhibitor Soaking 

10 BACE inhibitors were dissolved in DMSO to a concentration of 500 mM and then diluted 1 
in 10 in a harvesting solution composed of 220 mM ammonium iodide, 220 mM sodium 
cacodylate pH 6.4 and 22% PEG 5K MME or 100-200 mM sodium citrate pH 5.0, 200 mM 
ammonium iodide and 30% PEG 5K MME. Apo-BACE protein crystals were transferred 
into the harvesting solution for a period of up to 24 hours prior to being dipped in 

15 cryoprotectant (20% PEG 5000 MME, 200 mM ammonium iodide, 200 mM sodium 
cacodylate pH 6.4 and 20% ( V / V ) glycerol or 200 mM sodium citrate pH 5.0, 200 mM 
ammonium iodide, 30% PEG 5K MME and 20% ( v / v ) glycerol) containing the inhibitor and 
frozen in liquid nitrogen. 

Data Collection & Processing 

20 The structure of apo-BACE was solved from BACE WT R56KR57K to 1 .75 A resolution 
using the method of molecular replacement. Prior to data collection, crystals were exposed, 
briefly, to cryoprotectant, described previously, before flash freezing. Data was collected at 
100 °K on beamline ID 14-1 at the European Synchrotron Radiation Facility using an ADSC 
Quantum4 CCD detector, with a wavelength of 0.934A and processed using MOSFLM 

25 (Leslie, A. G. W. (1992). In Joint CCP4 and EESF-EACMB Newsletter on Protein 

Crystallography, vol. 26, Warrington, Daresbury Laboratory). The dataset was scaled using 
SCALA (CCP4 - Collaborative Computational Project 4. (1994) The CCP4 Suite: Programs 
for Protein Crystallography. Acta Crystallographica D50, 760-763) and the intensities 
converted to structure factor amplitudes with TRUNCATE (Evans, P. R. (1997). Scaling of 

30 MAD data. In Recent Advances in Phasing (ed. K. S. Wilson, G. Davies, A. W. Ashton and 
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S. Bailey), pp. 97-102. Council for the Central Laboratory of the Research Councils 
Daresbury Laboratory, Daresbury, UK), from the CCP4 suite of programs (CCP4 - 
Collaborative Computational Project 4. (1994) The CCP4 Suite: Programs for Protein 
Crystallography. Acta Crystallographica D50, 760-763). Statistics for the processing are 
5 shown in Table 2. 

TABLE 2: Data collection statistics for apo-BACE. 



Resolution 


1.75 A 


Mosaicity 


0.34° 


Completeness 


95.9% 


Multiplicity 


6.3 


Rmerge 


0.097 



Structure Determination and Refinement 

The structure of apo-BACE was solved by molecular replacement using the program EPMR 
10 (Kissinger CR, Gehlhaar DK, Fogel DB, Acta Crystallogr D Biol Crystallogr, 1999,vol 55 
(Pt 2), 484-91). Initially, it was impossible to know whether the correct space group was 
P6j22 or P6 5 22, therefore molecular replacement attempts were performed against both. 
Default parameters and a resolution range of 1 5-4A were used in conjunction with the A 
chain of 1FKN (Hong et al, 2000) as the search model. A solution was found for P6i22 
1 5 with an Rfactor of 0.458 and a correlation coefficient of 0.543. In an attempt to reduce 
model bias, the molecular replacement solution was used as the starting point for 
ARP/wARP (Morris RJ, Perrakis A, Lamzin VS, Acta Crystallogr D Biol Crystallogr, 
2002,vol 58,(Pt 6 No 2), 968-75) to perform automated backbone tracing using warpNtrace 
and side chain building via the Side dock procedure. This produced a discontinuous model 
20 composed of 244 out of 385 residues spanning 12 amino acid chains. Cycles of structural 
refinement with REFMAC5 (Murshudov, G. N., Vagin, A. A. and Dodson, E. J. (1 997). 
Refinement of macromolecular structures by the maximum-likelihood method. Acta 
Crystallographica, 1997 D53, 240-255) were alternated with manual rebuilding of the 
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model using QUANTA (Jones et al, Acta Crystallography A47 (1991), 1 10-1 19 and 
commercially available from Accelerys, San Diego, CA). The model was extended to 329 
residues with chain breaks between 156-170, 255-280 and 31 1-325. CNX (Brunger et al., 
Current Opinion in Structural Biology, Vol. 8, Issue 5, October 1998, 606-61 1, and 
5 commercially available from Accelerys, San Diego, CA) composite omit maps were 

generated to allow further building of the structure and finally water molecules added using 
Denlnt (Astex internal software library). Refinement statistics are shown in Table 3. 

TABLE 3: Final refinement statistics for apo-BACE 



Rwork 


0.251 


Rfree 


0.284 


RMS bond deviation from ideality 


0.01 1 


RMS bond angle deviation from ideality 


1.30 


Average Bfactor for structure 


32.99 



10 This data indicates that the final structure is of good quality; the Rfactors indicating that the 
refined model has a good agreement with the experimental data. The RMS deviations from 
ideality indicate that the geometry of the model is good. 

Description of the Apo Structure of BACE 

The structure of BACE we present here has been solved in the absence of substrate or 
15 inhibitor. This is the first time that such a structure has been described. The solution of this 
structure has been possible as we have, for the first time, crystallized BACE without 
compound in a form suitable for diffracting X-rays, and hence allowed the determination of 
the apo structure of BACE. Under our conditions it crystallizes in space group P6]22 with a 
monomer in the asymmetric unit. This is a novel crystal form of BACE. 

20 The protein chain has been traced in the electron density from residue Phe47p to Alal 57, 
and then from Alal68 to Asn385. There is no indication as to the position of residues 158 to 
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167 in the electron density map. In addition to the protein atoms, the model contains 3 
iodine atoms and 285 water molecules in its present state of refinement. 

The majority of the residues in this form of BACE are well defined, the exceptions being 
some exposed residues. Parts of the protein surface are exposed to solvent, as a consequence 
5 of the molecular packing within the crystal lattice (Figure 1). Residues 255-259, 271-277 
and 3 10 to 3 17 are exposed and have high B-factors relative to the body of the protein. In 
addition, residues 304 to 309 pack against an exposed loop and are poorly ordered with high 
b-factors. There are three disulphide bonds in BACE, two of these are well defined in the 
electron density, the third, between Cys269 and Cys319 has high temperature factors. This 
JO is probably a consequence of its proximity to exposed parts of the protein. 

BACE as it has been solved in this form, is a compact globular protein, which is formed by 
two domains; domain 1 being comprised of residues 47p-146 and domain 2 of residues 
(146-385)(numbering from Hong et al 9 2000). The active site lies between these two 
domains, and contains the two conserved aspartic acid residues, Asp32 and Asp228, which 
15 define the active sites of aspartic proteinases. In our structure, a single water molecule is 
coordinated between these two residues. 

The overall fold of the protein is similar to that of 1FKN (Hong et al, 2000), with a few 
minor, but potentially significant changes. Residues 158-166 are ordered in the structure of 
BACE in the presence of OM99-2 (in the P2i form), and consist of a loop plus a short helix. 

20 In the P6i22 unliganded form, these residues cannot be seen, and are assumed to be mobile. 
This may be a consequence of the crystal packing arrangement in this form. Residues 69-75 
have a different arrangement in the crystal form described here, to their arrangement in the 
crystal structure of the OM99-2 complex. The residues are displaced upward relative to the 
active site in the structure without OM99-2. The two molecules can be superposed over all 

25 residues using the program MAPS (MAPS-Multiple Alignment of Proteins Structures 

Version 0.2, Sep-7-1999, Guoguang, Lund University, Sweden and Lu, G. An Approach for 
Multiple Alignment of Protein Structures (1998, in manuscript) to give an r.m.s.d. of 0.74 
A. This results in close alignment of the N-terminal residue prior to residue 69 and 
subsequent to 75. In contrast the CA atoms of residue 71 are displaced by 3.3 A, those of 

30 residue 72 by 4.3 A, and those of residue 73 by 6.0 A. (Figure 2) The reason for this 

difference is postulated to be the interaction of OM99-2 backbone residues with the protein 
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residues, in an arrangement analogous to a beta sheet. This interaction pulls the loop down 
over the substrate in the active site, and locks it in position. In the absence of substrate, or 
peptidic inhibitor, the loop moves back up again. 

In addition to these local changes in structure, on binding of inhibitor, there appears to be a 
5 slight shift in the domain positions relative to each other, resulting in an average difference 

in position in the C-terminal domain CA atoms of about 2.0 A, when the molecules are 

i 

superposed using the N- terminal CA atoms. 

The symmetry of the P6j22 crystal system has resulted in a packing arrangement which 
brings part of a symmetry related molecule very close to the active site entrance of BACE. 
10 Gln73 from a symmetry related molecule lies very close to the entrance to the active site of 
BACE in this crystal form, and overlaps with the position occupied by P4 Glu in OM99-2. 
However, this does not interfere with the usefulness of this crystal system to soak in 
inhibitors, as we have shown that these crystals can be used to soak BACE inhibitors into 

the active site. 

i 
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15 Equivalents 

The foregoing description details presently preferred embodiments of the present invention 
which are therefore to be considered in all respects as illustrative and not restrictive. Those 
skilled in the art will recognize, or be able to ascertain, using no more than routine 
experimentation, many equivalents, modifications and variations to the specific 
20 embodiments of the invention described specifically herein. Such equivalents, modifications 
and variations are intended to be (or are) encompassed in the scope of the following 
paragraphs: 

1 . A mutant BACE protein, which protein lacks one or more proteolytic cleavage sites 
recognized by clostripain (or another protease which recognizes the same cleavage 
site as clostripain). 
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2. The protein of paragraph 1 wherein BACE residues R56 and/or R57 (based on 
numbering of SwissProt P568 1 7) are mutated or deleted. 

3. The protein of paragraph 2 wherein R56 or R57 are mutated by the substitution of 
arginine for lysine. 

4. The protein of paragraph 2 wherein R56 and R57 are mutated by the substitution of 
arginine for lysine. 

5. The protein of any one of the preceding paragraphs which comprises BACE residues 
56 to 396 (based on numbering of SwissProt P56817). 

6. A mutant BACE protein (for example, a mutant BACE protein as defined in any one 
of the preceding paragraphs) which is truncated at the N-terminal up to and including 
R42, R45, G55, R56 or R57. 

7. The protein of any one of paragraphs 1 to 6 truncated at the C-terminal such that at 
least residues 454 et seq. are absent. 

8. The protein of paragraph 7 truncated at the C-terminal such that at least residues 447 
et seq. are absent. 

9. The protein of any one of the preceding paragraphs wherein the asparagine residues 
at positions 153, 172, 223 and 354 are mutated to glutamine residues. 

10. The protein of any one of the preceding paragraphs which is un- or deglycolsylated. 

11. A mutant BACE protein selected from: (a) SEQ ID 6; (b) SEQ ID 8; (c) SEQ ID 10; 
(d) SEQ ID 12; (e) SEQ ID 14; (f) SEQ ID 16; (g) SEQ ID 18; (h) SEQ ID 19; (i) 
SEQ ID 20; (j) SEQ ID 21 . 

12. Nucleic acid encoding the protein of any one of the preceding paragraphs. 

13. A vector comprising the nucleic acid of paragraph 12. 

14. A host cell comprising the vector of paragraph 1 3 . 
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15. A process for producing the protein of any one of paragraphs 1 to 1 1 comprising the 
steps of: (a) culturing the host cell of paragraph 14 under conditions suitable for 
expression of the protein; and optionally (b) isolating the expressed recombinant 
BACE protein. 

16. A process for producing refolded recombinant BACE comprising the steps of: (a) 
solubilising the recombinant BACE; (b) diluting the solubilised BACE into an 
aqueous buffer containing sulfobetaine (for example at a concentration of 10 to 50 
mM); and (c) maintaining the diluted solution at low temperature (for example, 3 to 
6°C) and at high pH (e.g. 9 to 10.5) for at least 2 weeks. 

17. The process of paragraph 16 wherein the recombinant BACE is produced according 
to the process of paragraph 15. 

18. Refolded recombinant BACE produced by, or obtainable by, the process of 
paragraph 16 or paragraph 17. . 

19. A process for producing a crystal of BACE comprising the step of refolding 
recombinant BACE protein according to the process of paragraph 16 or paragraph 
17. 

20. A process for producing a crystal of BACE comprising the step of growing the 
crystal by vapour diffusion using a reservoir buffer that contains 1 8-26 % PEG 5000 
MME (for example, 20-24 % PEG 5000 MME, e.g. 20-22.5 % PEG 5000 MME), 

1 80-220 mM (e.g. 200 mM) ammonium iodide and 1 80-22- mM (e.g. 200 mM) tri- 
sodium citrate (pH 6.4-6.6). 

21 . The process of paragraph 20 wherein the BACE is recombinant and the process 
further comprises the preliminary step of refolding the recombinant BACE according 
to the process of paragraph 1 6 or paragraph 17. 

22. The process of any one of paragraphs 1 8 to 20 further comprising the step of 
activating the BACE by clostripain digestion. 1 

23. The process of paragraph 21 wherein the BACE is as defined in any one of 
paragraphs 1 to 10. 
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24. A crystal of B ACE produced by, or obtainable by, the process of any one of 
paragraphs 18 to 22. 

25. A crystal of BACE having a hexagonal space group P6i22. 

26. The crystal of paragraph 25 having unit cell dimensions of a=b=103.2 A, c=169.1 A, 
a=p=60°, y=120°, and a unit cell variability of 5% in all dimensions. 

27. The crystal of paragraph 25 or paragraph 26 which contains one copy of BACE in 
the asymmetric unit. 

28. A crystal of BACE (e.g. a crystal according to any one of paragraphs 24 to 27) 
having a resolution better than 3 A. 

29. The crystal of paragraph 28 having a resolution better than 2.5 A. 

30. The crystal of paragraph 29 having a resolution better than 1 .8 A. 

31. A crystal of BACE (e.g.. a crystal according to any one; of paragraphs 24 to 30) 
comprising a structure defined by all or a portion of the co-ordinates of Table 1 . 

32. The crystal of paragraph 3 1 comprising a structure defined by a portion of the 
coordinates of Table 1 which coordinates relate to: (a) the BACE catalytic domain; 
and/or (b) a BACE active site; and/or (c) a BACE binding cavity; and/or (d) selected 
amino acid residues of a BACE binding cavity located in a protein framework which 
holds the selected amino acids in a relative spatial configuration which corresponds 
to the spatial configuration of those amino acids in Table 1 ; and/pr (d) a BACE 
binding site. 

33. The crystal of paragraph 32 wherein the portion of the coordinates of Table 1 
comprise (or consist essentially of) those relating to residues SER71, GLY72, 
LEU91, ASP93, GLY95, SER96, VAL130, PR0131, TYR132, THR133, GLN134, 
ILE171, ILE179, ILE187, ALA188, ARG189, PRO190, TRP258, TYR259, ASP284, 
LYS285, ASP289, GLY29 1 , THR292, THR293, ASN294, ARG296 and ARG368 
(based on the numbering of SwissProt P56817). 
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34. The crystal of paragraph 33 wherein the portion of the coordinates of Table 1 
comprise (or consist essentially of) those relating to residues LYS70, SER71, 
GLY72, GLN73, GLY74, TYR75, LEU91, VAL92, ASP93, THR94, GLY95, 
SER96, SER97, ASN98, TYR129, VAL130, PRQ131, TYR132, THR133, GLN134, 
GLY135, LYS136, TRP137, LYS168, PHE169, PHE170, ILE171, ASN172, 
SER174, TRP176, GLY178, ILE179, LEU180, GLY181, ALA183, TYR184, 
ALA185, GLU186, ILE187, ALA188, ARG189, PRO190, ASP191, ASP192, 
ARG256, TRP258, TYR259, TYR283, ASP284, LYS285, SER286, ILE287, 
VAL288, ASP289, SER290, GLY291, THR292, THR293, ASN294, LEU295, 
ARG296, GLY325, GLU326, ARG368, VAL370, LYS382, PHE383, ALA384, 
ILE385, SER386, GLN387, SER388, SER389, THR390, GLY391, THR392, 
VAL393, GLY395, ALA396 and ILE447 (based on the numbering of SwissProt 
P56817). 

35. The crystal of any one of paragraphs 24 to 34 which is capable of being soaked with 
compound(s) to form co-complex structures. 

36. The crystal of any one of paragraphs 24 to 35 which is soaked with one or more 
compound(s) to form co-complex structures. 

37. The crystal of any one of paragraphs 24 to 36 wherein the BAGE is co-crystallized 
with one or more compound(s) to form co-crystallized structures. 

38. The crystal of any one of paragraphs 24 to 35 which is an apo crystal. 

39. The crystal of any one of paragraphs 24 to 38 wherein the BACE is a wild-type 
BACE. 

40. The crystal of paragraph 39 wherein the BACE is a human BACE. 

4 1 . The crystal of paragraph 40 wherein the BACE is a homologue of a human BACE. 

42. The crystal of paragraph 41 wherein the homologue is an orthologue or a paralogue 
of a human BACE. 
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43. The crystal of any one of paragraphs 24 to 38 wherein the BACE is a mutant and/or 
recombinant BACE. 

44. The crystal of paragraph 43 wherein the BACE: (a) lacks the BACE transmembrane 
and/or cytoplasmic domain(s); and/or (b) lacks one or more glycolsylation sites; 
and/or (c) comprises one or more peptide tags (for example a his tag); and/or (d) 
lacks one or more protease cleavage site(s); and/or (e) is truncated at the N-terminus; 
and/or (f) is truncated at the C-terminus; and/or (f) lacks the BACE pro-sequence. 

i 

45. The crystal of paragraph 44 wherein the BACE mutant lacks one or more clostripain 
cleavage sites. 

46. The crystal of paragraph 45 wherein BACE residues R56 and/or R57 (based on 
numbering of SwissProt P56817) are mutated or deleted. 

47. The crystal of paragraph 46 wherein R56 or R57 are mutated by the substitution of 
arginine for lysine. 

48. The crystal of paragraph 46 wherein R56 and R57 are mutated by the substitution of 
arginine for lysine. 

49. The crystal of any one of paragraphs 43 to 48 wherein the BACE mutant is truncated 
at the N-terminal up to and including R42. 

50. The crystal of any one of paragraphs 43 to 49 wherein the BACE mutant is truncated 
at the C-terrninal such that at least residues 396 et seq. are absent. 

51 . The crystal of paragraph 50 wherein the BACE mutant is truncated at the C -terminal 
such that at least residues 387 et seq. are absent. 

52. The crystal of any one of paragraphs 43 to 51 wherein the asparagine residues at 
positions 153, 172, 223 and 354 of the BACE mutant are mutated to glutamine 
residues. 

53. The crystal of any one of paragraphs 24 to 52 wherein the BACE is un- or 
deglycolsylated. 
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54. The crystal of paragraph 43 wherein the BACE mutant is selected from: (a) SEQ ID 
19; (b) SEQ ID 20; (c) SEQ ID 21. 

55. The process of any one of paragraphs 1 9 to 23 wherein the process produces a crystal 
of BACE as defined in any one of paragraphs 24 to 54. 

56. A three-dimensional representation of BACE or of a portion of BACE, which 
representation comprises all or a portion of the coordinates of Table 1 . 

57. The three-dimensional representation of paragraph 56 which is a model constructed 
from all or a portion of the coordinates of Table 1 . 

58. The model of paragraph 57 wherein the portion of BACE is a BACE binding cavity 
and the portion of the coordinates of Table 1 comprise those of atoms defining a 
binding site within the binding cavity (for example, wherein the coordinates are as 
defined in paragraph 33 or paragraph 34). 

59. A three-dimensional representation of a compound which fits the model of paragraph 
57 or paragraph 58. 

60. The three-dimensional representation of paragraph 59 which is a model of the 
compound. 

61 . The model of paragraph 60 wherein the compound is a pharmacophore. 

62. The model of any one of paragraphs 57, 58, 60 or 61 which is: (a) a wire-frame 
model; (b) a chicken-wire model; (c) a ball-and-stick model; (d) a space-filling 
model; (e) a stick-model; (f) a ribbon model; (g) a snake model; (h) an arrow and 
cylinder model; (i) an electron density map; (j) a molecular surface model. 

63. The model of any one of paragraphs 57, 58, 60, 61 or 62 which is in physical form. 

64. The model of any one of paragraphs 57, 58, 60, 61 or 62 which is in electronic form. 

65. The model of paragraph 64 which comprises a graphical image display on a 
computer screen. 
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66. A computer-based method for the analysis of the interaction of a molecular structure 
with a BACE structure of the invention, which comprises: (a) providing a BACE 
model as defined in paragraph 57, 58 or 62 to 65; (b) providing a molecular structure 
to be fitted to said BACE model; and (c) fitting the molecular structure to the BACE 
model to produce a compound model as defined in paragraph 60, 61 or 62 to 65. 

67. A computer-based method for the analysis of the interaction of a molecular structure 
with a BACE structure of the invention, which comprises: (a) providing the structure 
of a BACE as defined by the coordinates of Table 1 ; (b) providing a molecular 
structure to be fitted to said BACE structure; and (c) fitting the molecular structure to 
the BACE structure of Table 1 . 

68. A computer-based method for the analysis of molecular structures which comprises: 
(a) providing the coordinates of at least two atoms of a BACE structure as defined in 
Table 1 ("selected coordinates"); (b) providing the structure of a molecular structure 
to be fitted to the selected coordinates; and (c) fitting the structure to the selected 
coordinates of the BACE structure. 

69. The method of paragraph 68 wherein the selected coordinates represent a binding 
pocket. 

70. The method of paragraph 68 or paragraph 69 wherein the selected coordinates are of 
at least 5, 10, 50 or 100 atoms. 

71 . The method of paragraph 69 or paragraph 70 wherein the selected coordinates are as 
defined in paragraph 33 or paragraph 34. ( 

72. A computer-based method of rational drug design comprising the method of any one 
of paragraphs 66 to 7 1 . 

73. A computer-based method of rational drug design comprising comprising: (a) 
providing the coordinates of at least two atoms of a BACE structure as defined in 
Table 1 ("selected coordinates"); (b) providing the structures of a plurality of 
molecular fragments; (c) fitting the structure of each of the molecular fragments to 
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the selected coordinates; and (d) assembling the molecular fragments into a single 
molecule to foim a candidate modulator molecule. 

74. A method for identifying a candidate modulator (e.g. candidate inhibitor) of BACE 
comprising the steps of: (a) employing a three-dimensional structure of BACE, at 
least one sub-domain thereof, or a plurality of atoms thereof, to characterise at least 
one BACE binding cavity, the three-dimensional structure being defined by atomic 
coordinate data according to Table 1; and (b) identifying the candidate modulator by 
designing or selecting a compound for interaction with the binding cavity. 

75. The method of paragraph 74 wherein the three-dimensional structure of BACE is a 
model as defined in paragraph 57 or paragraph 58. 

76. A method for identifying an agent compound (e.g. an inhibitor) which modulates 
BACE activity, comprising the steps of: (a) employing three-dimensional atomic 
coordinate data according to Table 1 to characterise at least one (e.g. a plurality of) 
BACE binding site(s); (b) providing the structure of a candidate agent compound; (c) 
fitting the candidate agent compound to the binding sites; and (d) selecting the 
candidate agent compound. 

77. The method of paragraph 76 wherein in step (a) the three-dimensional atomic 
coordinate data are employed to create a model as defined in paragraph 57, 58 or 62 
to 65. 

78. The method of any one of paragraphs 73 to 77further comprising the step of: (a) 
obtaining or synthesising the candidate agent or modulator; and (b) contacting the 
candidate modulator with BACE to determine the ability of the candidate modulator 
to interact with BACE. 

79. A method of assessing the ability of a candidate modulator to interact with BACE 
which comprises the steps of: (a) obtaining or synthesising said candidate modulator; 
(b) forming a crystallized complex of BACE and said candidate modulator; and (c) 
analysing said complex by X-ray crystallography or NMR spectroscopy to determine 
the ability of said candidate modulator to interact with BACE. 
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80. A method for determining the structure of a compound bound to BACE, said method 
comprising: (a) mixing BACE with the compound to form a BACE-compound 
complex; (b) crystallizing the BACE-compound complex; and (c) determining the 
structure of said BACE-compound(s) complex by reference to the data of Table 1 . 

81 . A method for determining the structure of a compound bound to BACE, said method 
comprising: (a) providing a crystal of BACE; (b) soaking the crystal with one or 
more compound(s) to form a complex; and (c) determining the structure of the 
complex by employing the data of Table 1 . 

82. A method of determining the three dimensional structure of a BACE homologue or 
analogue of unknown structure, the method comprising the steps of: (a) aligning a 
representation of an amino acid sequence of the BACE homologue or analogue with 
the amino acid sequence of the BACE of Table 1 to match homologous regions of 
the amino acid sequences; (b) modelling the structure of the matched homologous 
regions of said target BACE of unknown structure on the corresponding regions of 
the BACE structure as defined by Table 1 ; and (c) determining a conformation for 
the BACE homologue or analogue which substantially preserves the structure of said 
matched homologous regions. 

83. The method of paragraph 82 wherein steps (a) and/or (b) and/or (c) are performed by 
computer modelling. 

84. A method of providing data for generating structures and/or performing rational drug 
design for BACE, BACE homologues or analogues, complexes of BACE with a 
potential modulator, or complexes of BACE homologues or analogues with potential 
modulators, the method comprising: (i) establishing communication with a remote 
device containing computer-readable data comprising at least one of: (a) atomic 
coordinate data according to Table 1 , said data defining the three-dimensional 
structure of BACE, at least one sub-domain of the three-dimensional structure of 
BACE, or the coordinates of a plurality of atoms of BACE; (b) structure factor data 
for BACE, said structure factor data being derivable from the atomic coordinate data 
of Table 1; (c) atomic coordinate data of a target BACE homologue or analogue 
generated by homology modelling of the target based on the data of Table 1; (d) 
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atomic coordinate data of a protein generated by interpreting X-ray crystallographic 
data or NMR data by reference to the data of Table 1; and (e) structure factor data 
derivable from the atomic coordinate data of (c) or (d); and (ii) receiving said 
computer-readable data from said remote device. 

85. A computer system containing one or more of: (a) atomic coordinate data according 
to Table 1 , said data defining the three-dimensional structure of B ACE or at least 
selected coordinates thereof; (b) structure factor data (where a structure factor 
comprises the amplitude and phase of the diffracted wave) for RACE, said structure 
factor data being derivable from the atomic coordinate data of Table 1 ; (c) atomic 
coordinate data of a target BACE protein generated by homology modelling of the 
target based on the data of Table 1 ; (d) atomic coordinate data of a target BACE 
protein generated by interpreting X-ray crystallographic data or NMR data by 
reference to the data of Table 1; or (e) structure factor data derivable from the atomic 
coordinate data of (c) or (d). 

86. The computer system of paragraph 85 comprising: (i) a computer-readable data 
storage medium comprising data storage material encoded with the computer- 
readable data; (ii) a working memory for storing instructions for processing said 
computer-readable data; and (iii) a central-processing unit coupled to said working 
memory and to said computer-readable data storage medium for processing said 
computer-readable data and thereby generating structures and/or performing rational 
drug design. 

87. The computer system of paragraph 86 further comprising a display coupled to said 
central-processing unit for displaying said structures. 

88. A computer-readable storage medium, comprising a data storage material encoded 
with computer readable data, wherein the data are defined by all or a portion of the 
structure coordinates of BACE of Table 1, or a homologue of BACE, wherein said 
homologue comprises backbone atoms that have a root mean square deviation from 
the backbone atoms (nitrogen-carbon^-carbon) of Table 1 of not more than 1.5 A. 
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89. A computer-readable data storage medium comprising a data storage material 
encoded with a first set of computer-readable data comprising a Fourier transform of 
at least a portion (e.g. selected coordinates as defined herein) of the structural 
coordinates for B ACE according to Table 1 ; which, when combined with a second 
set of machine readable data comprising an X-ray diffraction pattern of a molecule or 
molecular complex of unknown structure, using a machine programmed with the 
instructions for using said first set of data and said second set of data, can determine 
at least a portion of the structure coordinates corresponding to the second set of 
machine readable data. 

90. A computer readable medium with at least one of: (a) atomic coordinate data 
according to Table 1 recorded thereon, said data defining the three-dimensional 
structure of B ACE, or at least selected coordinates thereof; (b) structure factor data 
for BACE recorded thereon, the structure factor data being derivable from the atomic 
coordinate data of Table 1; (c) atomic coordinate data of a target BACE protein 
generated by homology modelling of the target based on the data of Table 1; (d) 
atomic coordinate data of a BACE-ligand complex or a BACE homologue or 
analogue generated by interpreting X-ray crystallographic data or NMR data by 
reference to the data of Table 1; and (e) structure factor data derivable from the 
atomic coordinate data of (c) or (d). 

91 . A method for determining the structure of a protein, which method comprises; 
providing the co-ordinates of Table 1, and either (a) positioning the co-ordinates in 
the crystal unit cell of said protein so as to provide a structure for said protein or (b) 
assigning NMR spectra Peaks of said protein by manipulating the coordinates of 
Table 1. 

92. A process for producing a medicament, pharmaceutical composition or drug, the 
process comprising: (a) identifying a BACE modulator molecule according to the 
method as defined in any one of paragraphs 73 to 79; (b) optimising the structure of 
the modulator molecule; and (c) preparing a medicament, pharmaceutical 
composition or drug containing the optimised modulator molecule. 
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93. A medicament, pharmaceutical composition or drug produced by, or obtainable by, 
the process of paragraph 92. 

94. A compound identified, produced or obtainable by the process or method of any one 
of paragraphs 73 to 79. 

95. A pharmaceutical composition, medicament, drug or other composition comprising 
the compound of paragraph 94. 

96. The medicament, pharmaceutical composition or drug of paragraph 93, compound of 
paragraph 94 or composition of paragraph 95 for use in medicine, for example for 
use in therapy or prophylaxis. 

97. The medicament, pharmaceutical composition, drug or composition of paragraph 96 
wherein the therapy or prophylaxis comprises inhibiting BACE or the production of 
A/3 or fragments thereof or the treatment of Alzheimer's disease. 

98. A method of inhibiting BACE or the production of A/3 or fragments thereof or 
treating Alzheimer's disease comprising administering the medicament, 
pharmaceutical composition, drug or composition of paragraph 96 to the patient. 

99. The method of paragraph 84, wherein the computer readable data is transmitted form 
the remove device. 

100. The method of paragraph 99, wherein the data is transmitted electronically or 
optically. 
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Sequence Listings 

SEQ ID 1 : shows the DNA sequence coding for the BACE protein, BACE WT. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 
GCCCACGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCC 
CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCCGGAGGGGC 
AGCTTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAG 
ATGACCGTGGGCAGCCCCCCGCAGACGCTCAACATCCTGGTGGATACAGGCAGCAGTAAC 
TTTGCAGTGGGTGCTGCCCCCCACCCCTTCCTGCATCGCTACTACCAGAGGCAGCTGTGC 
AGCACATACCGGGACCTCCGGAAGGGTGTGTATGTGCCCTACACCCAGGGCAAGTGGGAA 
GGGGAGCTGGGCACCGACCTGGTAAGCATCCCCCATGGCCCCAACGTCACTGTGCGTGCC 
AACATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCAACGGCTCCAACTGGGAAGGC 
ATCCTGGGGCTGGCCTATGCTGAGATTGCCAGGCCTGACGACTCCCTGGAGCCTTTCTTT 
GACTCTCTGGTAAAGCAGACCCACGTTCCCAAGCTCTTCTGCCTGCAGCTTTGTGGTGCT 
GGCTTCCCCCTCAACCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGA 
GGTATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCCATCCGGCGGGAGTGG 
TATTATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGACTGC 
AAGGAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACCACCAACCTTCGTTTGCCC 
AAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTC 
CCTGATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACCCCTTGG 
AACATTTTCCCAGTCATCTCACTCTACCTAATGGGTGAGGTTACCAACCAGTCCTTCCGC 
ATCACCATCGTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGAC 
GACTGTTACAAGTTTGCCATCTCACAGTCATCCACGGGCACTGTTATGGGAGCTGTTATC 
ATGGAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGC 
GCTTGCCATGTGCACGATGAGTTCAGGACGGCAGCGGTGGAAGGCeCTTTTGTCACCTTG 
GACATGGAAGACTGTGGGTACAACATTCCACAGACAGATGAGTCATAA 

SEQ ID 2: shows the deduced amino acid sequence for BACE WT. 

MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSG 

QGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLV 

SIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGF 

PLNQSEVIJ^SyGGSMIIGGIDHSLYTG.SLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSG 

LRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLR 

PVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYWFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDME 

CGYNI PQTDES 

SEQ ID 3 : shows the DNA sequence coding for the BACE protein, BACE N->Q. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 
GCCCACGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCC 
CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCCGGAGGGGC 
AGCTTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAG 
ATGACCGTGGGCAGCCCCCCGCAGACGCTGAACATCCTGGTGGATACAGGCAGCAGTAAe 
TTTGCAGTGGGTGCTGCCCCCCACCCCTTCCTGCATCGCTACTACCAGAGGCAGCTGTCC 
AGCACATACCGGGACCTCCGGAAGGGTGTGTATGTGCCCTACACGCAGGGCAAGTGGGAA 
GGGGAGCTGGGCACCGACCTGGTAAGCATCCCCCATGGCCCCCAGGTCACTGTGCGTGCC 
AACATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCCAGGGCTCCAACTGGGAAGGC 
ATCCTGGGGCTGGCCTATGCTGAGATTGCCAGGCCTGACGACTCCCTGGAGCCTTTCTTT 
GACTCTCTGGTAAAGCAGACCCACGTTCCCAAGCTCTTCTCCCTGCAGCTTTGTGGTGCT 
GGCTTCCCCCTCCAGCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGA 
GGTATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCGATCCGGCGGGAGTGG 
TATTATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGACTGC 
AAGGAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACCACCAACCTTCGTTTGCCC 
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AAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTC 
CCTGATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACCCCTTGG 
AACATTTTCGCAGTCATCTCACTCTACCTAATGGGTGAGGTTACCCAGCAGTCCTTCCGC 
ATCACCATCCTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGAC 
GACTGTTACAAGTTTGCCATCTCACAGTCATCCACGGGCACTGTTATGGGAGCTGTTATC 
ATGGAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGC 
GCTTGCCATGTGCACGATGAGTTCAGGACGGGAGCGGTGGAAGGCCCTTTTGTCACCTTG 
GACATGGAAGACTGTGGCTACAACATTCCACAGACAGATGAGTCACATCACCATCATCAC 
CACTAA 

SEQ ID 4; shows the deduced amino acid sequence for BACE N->Q. 

MASMTGGQQMGRGSl^GVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSG 
QGYYVEMTVGS PPQTLNI LVDTGS SNF AVGAAPH PFLHRYYQRQLS STYRDLRKGVYVPYTQGKWEGELGTDLV 
SIPHGPQVTVRANIAAITESDKFFIQGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGF 
PLQQS E VLAS VGGSM 1 1 GG I DHS LYTGS LW YTPI RREWY YE V 1 1 VRVE I NGQDLKMDC KE YNYDKS I VDSGTTN 
LRLPKKVFEAAVKS I KAASSTEKFPDGFWLGEQLVCWQAGTTPWNI FPVI SLYLMGEVTQQSFRITI LPQQYLR 
PVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYWFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMED 
CGYNI PQTDESHHHHHH 

SEQ ID 5: shows the DNA sequence coding for the BACE WT R56KR57K. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 
GCCCAGGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCC 
CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCAAGAAGGGC 
AGCTTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAG 
ATGACCGTGGGCAGCCCCCCGCAGACGCTCAACATCCTGGTGGATACAGGCAGCAGTAAC 
TTTGCAGTGGGTGCTGCCCCCCACCCCTTCCTGCATCGCTACTACCAGAGGCAGCTGTCC 
AGCACATACCGGGACCTCCGGAAGGGTGTGTATGTGCCCTACACCCAGGGCAAGTGGGAA 
GGGGAGCTGGGCACCGACCTGGTAAGCATCCCCCATGGCCCCAACGTCACTGTGCGTGCC 
T^ACATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCAACGGCTCCAACTGGGAAGGG 
ATCCTGGGGCTGGCCTATGCTGAGATTGCCAGGCCTGACGACTGCCTGGAGCCTTTCTTT 
GACTCTCTGGTAAAGCAGACCCACGTTCCCAACCTCTTCTCCCTGCAGCTTTGTGGTGCT 
GGCTTCCCCCTCAACCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGA 
GGTATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCCATCCGGCGGGAGTGG 
TATTATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGACTGC 
AAGGAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACCACCAACCTTCGTTTGCCC 
AAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTC 
CCTGATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACCCCTTGG 
AACATTTTCCCAGTCATCTCACTCTACCTAATGGGTGAGGTTACCAACCAGTCCTTCCGC 
ATCACCATCCTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGAC 
GACTGTTACAAGTTTGCGATCTCACAGTCATCCACGGGCACTGTTATGGGAGCTGTTATC 
ATGGAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGC 
GCTTGCCATGTGCACGATGAGTTCAGGACGGCAGCGGTGGAAGGCCCTTTTGTCACCTTG 
GACATGGAAGACTGTGGCTACAACATTCCACAGACAGATGAGTCATAA 

SEQ ID 6: shows the deduced amino acid sequence for BACE WT R56KR57K 

MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGA 

QGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLV 
SIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGF 
PI^QSEVI^SVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTN 
LRLPKK^FEAAVKSIKAASSTEKPPDGFWLGEQLVCWQAGTTPWNIFPVISLYIJyiGEVTNQSFRITILPQQYLR 
PVEDVATSQDDCYKFAI SQSSTGTVMGAVI MEGFYVVFDRARKRI GFAVSACHVHDEFRTAAVEGPFVTLDMED 
CGYNI PQTDES 
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SEQ ID 7: shows the DNA sequence coding for the BACE WT R57K. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 

GCCCACGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCC 

CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCCGGAAGGGC 

AGCTTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAG 

ATGACCGTGGGCAGCCCCCCGCAGACGCTCAACATCCTGGTGGATACAGGCAGGAGTAAC ' 

TTTGCAGTGGGTGCTGCCCCCCACCCCTTCCTGCATCGCTACTACCAGAGGCAGCTGTCC 

AGCACATACCGGGACCTCCGGAAGGGTGTGTATGTGCCCTACACCCAGGGCAAGTGGGAA 

GGGGAGCTGGGCACCGACCTGGTAAGCATCCCCCATGGCCCCAACGTCACTGTGCGTGCC 

AACATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCAACGGCTCCAACTGGGAAGGC 

ATCCTGGGGCTGGCCTATGCTGAGATTGCCAGGCCTGACGACTCCCTGGAGCCTTTCTTT 

GACTCTCTGGTAAAGCAGACCCACGTTCCCAACCTCTTCTCCCTGCAGCTTTGTGGTGCT ' 

GGCTTCCCCCTCAACCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGA 

GGTATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCCATCCGGCGGGAGTGG 

TATTATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGACTGC 

AAGGAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACCACCAACCTTCGTTTGCCC 

AAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTC 

CCTGATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACCCCTTGG 

AACATTTTCGCAGTCATCTCACTCTACCTAATGGGTGAGGTTACCAACCAGTCCTTCCGC 

ATCACCATCCTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGAC 

GACTGTTACAAGTTTGCCATCTCACAGTCATCCACGGGCACTGTTATGGGAGCTGTTATC 

ATGGAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGC 

GCTTGCCATGTGCACGATGAGTTCAGGACGGCAGCGGTGGAAGGCCCTTTTGTCACCTTG 

GACATGGAAGACTGTGGCTACAACATTCCACAGACAGATGAGTCATAA 

SEQ ID 8: shows the deduced amino acid sequence for BACE WT R57K. 



MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRKGSFVEMVDNLRGKSG 
QGYYVEMTVGSPPQTLNILVDTGSSNFAVGA^ 

SIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGF 
PLNQSEVLASVGGSMIIGGIDHSLYTGSLWYT^ 

LRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLR 

PVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYWFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDM^ 

CGYNIPQTDES 

SEQ ID 9: shows the DNA sequence coding for the BACE WT R57DEL. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 

GCCCACGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCC 

CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCAGGGGCAGC 

TTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAGATG 

ACCGTGGGCAGCCCCCCGCAGACGCTCAACATCCTGGTGGATACAGGCAGCAGTAACTTT 

GCAGTGGGTGCTGCCCCCCACCCCTTCCTGCATCGCTACTACCAGAGGCAGCTGTCCAGC 

ACATACCGGGACCTCCGGAAGGGTGTGTATGTGCCCTACACCCAGGGCAAGTGGGAAGGG 

GAGCTGGGCACCGACCTGGTAAGCATCCCCCATGGCCCCAACGTCACTGTGCGTGCCAAC 

ATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCAACGGCTCCAACTGGGAAGGCATC 

CTGGGGCTGGCCTATGCTGAGATTGCC AGGCCTGACGACTCCCTGGAGCCTTTCTTTGAC f 

TCTCTGGTAAAGCAGACCCACGTTCCCAACCTCTTCTCCCTGCAGCTTTGTGGTGCTGGC 

TTCCCCCTCAACCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGAGGT 

ATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCGATCCGGCGGGAGTGGTAT 

TATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGACTGCAAG ' 

GAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACGACCAACCTTCGTTTGCCCAAG 

AAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTCCCT 

GATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACCCCTTGGAAC 

ATTTTCCCAGTCATCTCACTCTACCTAATGGGTGAGGTTACCAACCAGTCCTTCCGCATC 
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ACCATCCTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGACGAC 
TGTTACAAGTTTGCCATCTCACAGTCATCCACGGGCACTGTTATGGGAGCTGTTATCATG 
GAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGCGCT 
TGCCATGTGCAGGATGAGTTCAGGACGGCAGCGGTGGAAGGCCCTTTTGTCACCTTGGAC 
ATGGAAGACTGTGGCTACAACATTCCACAGACAGATGAGTCATAA 

SEQ ID 10: shows the deduced amino acid sequence for BACE WT R57del. 

MASMTGGQQMGRGSMAGVLPAHGTQHGI RLPLRSGLGGAPLGLRLPRETDEEPEE PGRGS FVEMVDNLRGKSGQ 
GYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVS 
IPHGPNVTVRANIAAITESDKFFINGSNWEGILGIA^ 

LNQS EVLAS VGGSMI I GG I DHSLYTGSLWYTPIRREWY YEVI I VRVEINGQDLKMDCKE YNYDKS I VDSGTTNL 
RLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRP 
VEDVATSQDDGYKFAISQSSTGTVMGAVIMEGFYWFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDC 
GYNIPQTDES < 

SEQ ID 1 1 : shows the DNA sequence coding for the BACE N->Q R56KR57K. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 
GCCCACGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCC 
CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCAAGAAGGGC 
AGCTTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAG 
ATGACCGTGGGCAGCCCCCCGCAGACGCTCAACATCCTGGTGGATACAGGCAGCAGTAAC 
TTTGCAGTGGGTGCTGCCCCCCACCCGTTCCTGCATCGCTACTACCAGAGGCAGCTGTGC 
AGCACATACCGGGACCTCCGGAAGGGTGTGTATGTGCCGTACACCCAGGGCAAGTGGGAA 
GGGGAGCTGGGCACCGACCTGGTAAGCATCCCCCATGGCCCCCAGGTCACTGTGCGTGCG 
AACATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCCAGGGCTCCAACTGGGAAGGC 
ATCCTGGGGCTGGCCTATGCTGAGATTGCCAGGCCTGACGACTCCCTGGAGCCTTTCTTT 
GACTCTCTGGTAAAGCAGACCCACGTTCCCAACCTCTTCTCCCTGCAGCTTTGTGGTGCT 
GGCTTCCCCCTCCAGCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGGATGATCATTGGA 
GGTATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCCATCCGGCGGGAGTGG 
TATTATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGACTGC 
AAGGAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACCACCAACCTTCGTTTGCCC 
AAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTC 
CCTGATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACCCCTTGG 
AACATTTTCCCAGTCATCTCACTCTACCTAATGGGTGAGGTTACCCAGCAGTCCTTCCGC 
ATCACCATCCTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGAC 
GACTGTTACAAGTTTGCCATCTCACAGTCATCCACGGGCACTGTTATGGGAGCTGTTATC 
- ATGGAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGC 
GGTTGCGATGTGCACGATGAGTTCAGGACGGCAGCGGTGGAAGGCCCTTTTGTCACCTTG 
GACATGGAAGACTGTGGCTACAACATTC CACAGACAGATGAGTCACATCAC CATCATCAC 
CA'CTAA 

SEQ ID 12: shows the deduced amino acid sequence for BACE N->Q R56KR57K 



MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGKKGSFVEMVDNLRGKSG 
QGY YVEMTVGS PPQTLN I LVDTGSSNFAVGAAPHPFLHRYYQRQLS STYRDLRKGVYVPYTQGKWEGELGTDLV 
SIPHGPQVTVRANIAAITESDKFFIQGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAG 
PLQQSEVLAS VGGSMI I GGI DHSL YTGSLWYTP I RREWY YE VI I VRVEINGQDLKMDCKE YNYDKS I VDSGTTN 
LRLPK^FEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTQQSFRITILPQQYLR 
PVEDVATSQDDCYKFAISQSSTGTWGAVIMEGFYWFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMED 
CGYNI PQTDESHHHHHH 

SEQ ID 1 3 : shows the DNA sequence coding for the BACE N->Q R56KR57K no His. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 



00139124 



PATENT 
674553-2002.1 

137 

GCCCACGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGGCTGGGGGGCGCGCCC 
CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCAAGAAGGGC 
AGCTTTiGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAG 
ATGACCGTGGGCAGCCCCCCGCAGACGCTCAAGATCCTGGTGGATACAGGCAGGAGTAAC 
TTTGCAGTGGGTGCTGCCCCCCACCCCTTCCTGCATCGCTAGTACCAGAGGGAGCTGTCC 
AGCACATACCGGGACCTCCGGAAGGGTGTGTATGTGGCCTACACCCAGGGCAAGTGGGAA 
GGGGAGCTGGGCACCGACCTGGTAAGCATCCCCCATGGCCCCCAGGTCACTGTGCGTGCC 
. AACATTGCTGCCATCACTGAATGAGACAAGTTCTTCATCCAGGGCTCCAACTGGGAAGGC 
ATCCTGGGGCTGGCCTATGCTGAGATTGCCAGGCCTGACGACTCCCTGGAGCCTTTCTTT 
GACTCTCTGGTAAAGCAGACCCACGTTCCeAACCTCTTCTCCCTGCAGCTTTGTGGTGCT 
GGCTTCCCCCTCCAGCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGA 
GGTATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCCATCCGGCGGGAGTGG 
TATTATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGACTGC 
AAGGAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACCACCAACCTTCGTTTGCCG' 
AAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTC 
CCTGATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACGCCTTGG 
AAGATTTTCCCAGTCATCTCACTCTACCTAATGGGTGAGGTTACCCAGCAGTCCTTCCGC 
ATCACCATCCTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGAC 
GACTGTTACAAGTTTGC CATCT CAC AGTC AT CCACGGGC ACTGTTATGGGAGCTGTTATC 
ATGGAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGC 
GCTTGCCATGTGCACGATGAGTTCAGGACGGCAGCGGTGGAAGGCCCTTTTGTCACCTTG 
GACATGGAAGACTGTGGCTACAACATTCCACAGACAGATGAGTCATAG 

SEQ ID 14: shows the deduced amino acid sequence for BACE N->Q R56KR57K no His 

MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGKXGSFVEMTONLRGKSG 
QGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVWPYTQGKWEGELGTDL^ 
SIPHGPQVTVRANIAAITESDKFFIQGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGF 
PLQQSEVLASVGGSMIIGGIDHSLYTGSL^ 

LRLPKKVFEAAVKS IKAASSTEKFPDGFWLGEQLVCWQAGTTPWN^ 

PVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMED 
GGYNIPQTDES 

SEQ ID 15: shows the DNA sequence coding for the BACE N->Q R57K. 

ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCT 

GCCCACGGCACCCAGCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCC 

CTGGGGCTGCGGCTGCCCCGGGAGACCGACGAAGAGCCCGAGGAGCCCGGCCGGAAGGGC 

AGCTTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAGGGCTACTACGTGGAG 

ATGACCGTGGGCAGCCCCCCGC AGACGCTCAAC ATCCTGGTGGATAC AGGCAGCAGTAAC 

TTTGCAGTGGGTGCTGCCCCCCACCCCTTCGTGCATCGCTACTACCAGAGGCAGCTGTCC 

AGCACATACCGGGACCTCCGGAAGGGTGTGTATGTGCCCTACACCCAGGGCAAGTGGGAA 

GGGGAGCTGGGCACCGACCTGGTAAGCATCCCCGATGGCCGCCAGGTCACTGTGCGTGCC 

AACATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCCAGGGCTCCAACTGGGAAGGC 

ATCCTGGGGCTGGCCTATGCTGAGATTGCCAGGGGTGACGACTCCCTGGAGCCTTTCTTT 

GACTCTGTGGTAAAGCAGACCCACGTTCCCAACCTCTTCTCCCTGCAGCTTTGTGGTGCT 

GGCTTCCCCCTCCAGCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGA 

GGTATCGACCACTCGCTGTACACAGGCAGTCTCTGGTATACACCCATCCGGCGGGAGTGG 

TATTATGAGGTGATCATTGTGCGGGTGGAGATCAATGGACAGGATCTGAAAATGGAGTGC ' 

AAGGAGTACAACTATGACAAGAGCATTGTGGACAGTGGCACCACCAACCTTCGTTTGCCC 

AAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTC 

CCTGATGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGCACCACCCCTTGG 

AACATTTTCCGAGTCATCTCACTCTACCTAATGGGTGAGGTTACCCAGCAGTCCTTCCGC 

ATCACCATCCTTCCGCAGCAATACCTGCGGCCAGTGGAAGATGTGGCCACGTCCCAAGAC 

GACTGTTACAAGTTTGCCATCTCACAGTCATCCACGGGCACTGTTATGGGAGCTGTTATC 

ATGGAGGGGTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTGAGC 

GCTTGCCATGTGCACGATGAGTTCAGGACGGCAGCGGTGGAAGGCCCTTTTGTCACCTTG 

GACATGGAAGACTGTGGCTACAACATTCCACAGACAGATGAGTCACATCACCATCATCAC 
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CACTAA 

SEQ ID 16: shows the deduced amino acid sequence for BACE N->Q R57K 



MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRKGSFVEMWNLRGKSG 
QGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLV 
SIPHGPQVTVRANIAAITESDKFFIQGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGF 
PLQQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVI^ 

LRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTQQSFRITILPQQYLR 
PVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYWFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMED 
CGYNI PQTDESHHHHHH 



SEQ ID 17: shows the DNA sequence coding for the BACE N->Q R57DEL. 



ATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGCGGATCCATGGCGGGAGTGCTGCCTGCCCACGGCACCCA 
GCACGGCATCCGGCTGCCCCTGCGCAGCGGCCTGGGGGGCGCCCCCCTGGGGCTGCGGCTGCCCCGGGAGAGCG 
ACGAAGAGCCCGAGGAGCCCGGCAGGGGCAGCTTTGTGGAGATGGTGGACAACCTGAGGGGCAAGTCGGGGCAG 
GGCTACTACGTGGAGATGACCGTGGGCAGCCCCCCGCAGACGCTCAACATCCTGGTGGATACAGGCAGCAGTAA 
CTTTGCAGTGGGTGCTGCCCCCCACCCCTTCCTGCATCGCTACTACCAGAGGCAGCTGTCCAGCACATACCGGG 
ACCTCCGGAAGGGTGTGTATGTGCCCTACACCCAGGGCAAGTGGGAAGGGGAGCTGGGCACCGACCTGGTAAGC 
ATCCCCCATGGGCCCCAGGTCACTGTGCGTGCCAACATTGCTGCCATCACTGAATCAGACAAGTTCTTCATCCA 
GGGCTCCAACTGGGAAGGCATCCTGGGGCTGGCOTATGCTGAGATTGCCAGGCCTGACGACTCCCTGGAGCCTT 
TCTTTGACTCTCTGGTAAAGCAGACCCACGTTCCCAACCTCTTCTCCCTGCAGCTTTGTGGTGCTGGCTTCGCC 
CTCCAGCAGTCTGAAGTGCTGGCCTCTGTCGGAGGGAGCATGATCATTGGAGGTATCGACCACTCGCTGTACAC 
AGGCAGTCTCTGGTATACACCCATCCGGCGGGAGTGGTATTATGAGGTGATCATTGTGCGGGTGGAGATCAATG 
GACAGGATCTGAAAATGGACTGCAAGGAGTACAACTATGACAAGAGCATTGTGGAeAGTGGCACCACCAACCTT 
.GGTTTGGCCAAGAAAGTGTTTGAAGCTGCAGTCAAATCCATCAAGGCAGCCTCCTCCACGGAGAAGTTCCCTGA 
TGGTTTCTGGCTAGGAGAGCAGCTGGTGTGCTGGCAAGCAGGGACCACGCCTTGGAACATTTTCCCAGTCATCT 
CACTCTACCTAATGGGTGAGGTTACCCAGCAGTCCTTCCGCATCACCATCCTTCCGCAGCAATACCTGCGGCCA 
GTGGAAGATGTGGCGACGTCCCAAGACGACTGTTACAAGTTTGCCATCTCACAGTCATCCACGGGCACTGTTAT 
GGGAGCTGTTATCATGGAGGGCTTCTACGTTGTCTTTGATCGGGCCCGAAAACGAATTGGCTTTGCTGTCAGCG 
CTTGCCATGTGCACGATGAGTTCAGGACGGCAGCGGTGGAAGGCCCTTTTGTCACCTTGGACATGGAAGACTGT 
GGCTACAACATTCCACAGACAGATGAGTCACATCACCATCATCACCACTAA 

SEQ ID 1 8: shows the deduced amino acid sequence for BACE N->Q R57del 



MASMTGGQQMGRGSMAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRGSFVEMVDNLRGKSGQ 
GYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVS 
IPHGPQVTVRANIAAITESDKFFIQGSNWEGILGI^YAEIA^ 

LQQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVpSGTTNU 
RLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTQQSFRITILPQQYLRP 
VEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYWFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDC 
GYNI PQTDESHHHHHH 

SEQ ID 19: shows the amino acid sequence of BACE WT R56KR57K crystallised. 

LPRETDEEPEEPGKKGSFVE^^NLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQ 
LSSTY^DLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWE^ 
DDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLaASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVII 
VRVEINGQDLKMDCKEYTJYDKSIVDSGTT 

NIFPVISLYIJyiGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYWFDRARKR 
I G F AVS ACHVHDE FRTAAVEG P F VTLDMED CG YN I PQTDE S 
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SEQ ID 20: shows the amino acid sequence of BACE N->Q R56KR57K no His as 
crystallised. 

LPRETDEEPEEPGKKGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQ 
LSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPQVTVRANIAAITESDKFFIQGSNWEGILGLAYAEIARP 
DDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLQQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVII 
VRVEINGQDLKMDCKEYNYDKSIVI3SGTTNLRLPKKOT 

NI FPVI SLYLMGEVTQQSFRI TI LPQQ YLRPVEDVATSQDDCYKFAI SQSSTGTVMGAVIMEGF YWFDRARKR 
IGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYNI PQTDES 

SEQ ID 2 1 : shows the amino acid sequence of BACE N->Q R56KR57K crystallised. 

LPRETDEEPEEPGKKGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNiLVDTGSSNFAVGAAPHPFLHRYYQRQ 
LSSTYRDLRKGVWPYTQGKWEGELGTDLVSIPHGPQVTVRANIAAITESDKFFIQGSNWEGILGLAYAEIARP 
DDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLQQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVII 
VRVEINGQDLKMDCIGSYNYPKSIVPSGTTO 

NI FPVI SLYLMGEVTQQSFRITILPQQ YLRPVEDVATSQDDCYKFAI SQSSTGTVMGAVIMEGF YWFDRARKR 
IGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYNI PQTDE SHHHHHH 
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